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This  dissertation  deals  with  the  development  of  a  fast  and  accurate  simulation  tool  for  veiy- 
Jarge-scale  integrated  (VLSI)  circuits  consisting  of  metal-oxide-semiconductor  (MOS)  transistors.  Such 
tools  are  called  switch-level  timing  simulators  and  they  provide  adequate  information  on  the  perfor¬ 
mance  of  the  circuits  with  a  reasonable  expenditure  of  computation  time  even  for  very  large  circuits. 
The  algorithms  presented  in  this  thesis  can  handle  only  n-channel  MOS  (NMOS)  circuits,  but  are  easily 
extendible  to  handle  complementary  MOS  (CMOS)  circuits  as  welL 

An  NMOS  circuit  is  modeled  as  a  set  of  nodes  connected  by  transistor  switches.  Three  strengths 
and  three  states  are  used  to  represent  the  signals  at  the  nodes  in  the  circuit.  The  strengths  in  decreasing 
order  are  input,  pullup,  and  normal.  The  three  states  used  are  0,  u,  and  1,  with  0  and  1  representing 
the  conventional  low  and  high  signal  values  respectively  while  the  u  state  is  used  to  represent  inter¬ 
mediate  signal  values  and  sometimes  to  represent  situations  of  conflict.  Each  switch  is  either  open, 
closed ,  or  in  an  intermediate  state. 

The  enhancement  transistors  in  the  NMOS  network  are  first  partitioned  into  driver  and  pass 
transistors.  The  NMOS  network  itself  is  then  partitioned  into  multifunctional  blocks  (MFB),  pass 
transistor  blocks  (PTB),  and  input  sources  (SRC).  The  partitioning  is  an  automatic  process  that  is  com¬ 
pletely  transparent  to  the  user  and  can  be  performed  in  linear  time.  The  partitioned  blocks  are  then 
ordered  for  processing  so  that,  whenever  possible,  a  block  is  scheduled  for  processing  only  after  all  its 
inputs  have  been  previously  processed.  Since  this  is  not  possible  for  blocks  forming  feedback  loops,  a 
novel  dynamic  windowing  scheme  is  used  to  schedule  such  blocks. 


The  blocks  in  the  partitioned  network  are  then  simulated  at  the  switch  level  using  graph  algo¬ 
rithms,  producing  so-called  zero-delay  ternary  signal  waveforms.  The  zero-delay  signal  transitions  are 
then  delayed  by  using  delay  and  filtering  operators.  The  characteristics  of  the  delay  operator  are  com¬ 
puted  in  a  presimulation  phase  by  simulating  five  different  circuit  primitives  using  an  accurate  circuit 
simulator  such  as  SPICE2.  These  characteristics  are  stored  in  a  table.  During  the  simulation  a  circuit 
block  is  mapped  onto  one  of  the  five  primitives  and  appropriate  delay  values  are  obtained  by  fast  table 
lookup  techniques.  Several  factors  such  as  block  configuration,  loading,  device  geometries,  and  input 
slew  rates  are  taken  into  account  while  computing  the  delay  values. 

The  algorithms  presented  in  this  thesis  have  been  implemented  in  a  computer  program  called 
MOSTEV1.  In  all  the  circuits  simulated  thus  far,  MOSTIM  provides  timing  information  with  an  accu¬ 
racy  of  within  10%  of  that  provided  by  SPICE2,  at  approximately  two  orders  of  magnitude  faster  in 
simulation  speed. 
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CHAPTER  1 


INTRODUCTION 


The  design  of  an  electronic  circuit,  traditionally,  started  with  the  designer  who,  with  a  mental 
picture,  translated  his  or  her  ideas  into  the  form  of  a  circuit  schematic.  This  step  relied  heavily  on  the 
human  designer’s  intuition,  past  experience,  and  knowledge  to  make  reasonable  approximations.  This 
was  followed  by  the  “breadboarding"  phase  in  which  an  actual  prototype  of  the  circuit  was  constructed 
from  discrete  components  interconnected  by  external  wires  and  was  tested.  The  performance  of  the  cir¬ 
cuit,  if  not  found  satisfactory,  was  then  improved  by  adjusting  the  circuit  element  values  in  a  some¬ 
what  trial-and-error  fashion. 

The  advent  of  integrated  circuits,  however,  has  greatly  changed  the  picture.  There  are  several 
steps  involved  in  the  design  of  a  very  large-scale  integrated  (VLSI)  circuit,  which  may  consist  of 
several  hundreds  of  thousands  of  components,  mainly  transistors.  The  circuit  designer  first  obtains  a 
very  high-level  functional  description  of  the  circuit  based  on  specifications  provided  by  the  user.  The 
synthesis,  often  called  the  top-down  process,  translates  this  high-level  description  into  various  levels 
including  the  register  level,  the  transistor  level  (or  electrical  level)  etc.  and  terminates  at  the  physical 
mask-level,  i-e. ,  the  actual  layout  of  the  patterns  of  metal,  semiconductor,  and  insulating  material  by 
which  the  components  and  the  interconnections  are  achieved.  This  is  followed  by  the  design  verifica¬ 
tion,  or  the  bottom-up  process,  wherein  a  software  tool  called  an  extractor  is  first  used  to  obtain  a  cir¬ 
cuit  level  (or  transistor  level)  description  from  the  physical  layout.  The  breadboarding  phase  is 
replaced  by  using  a  simulation  tool  to  predict  the  performance  of  the  circuit  which  is  then  compared 
with  the  user’s  specifications,  thus  completing  the  so-called  design  loop.  If  the  performance  is  not  satis¬ 
factory,  certain  changes  are  made  and  the  whole  process  is  then  repeated.  The  total  time  spent  in  the 


design  loon  is  usually  referred  to  as  the  turn-around  lime. 

The  main  objective  of  the  VLSI  circuit  designer  is  to  obtain  designs  with  as  low  a  turn-around 
time  as  possible.  Computer-aided  design  tools  have  become  virtually  indispensable  at  various  steps  in 
the  design  process  to  perform  tasks  which  would  otherwise  take  a  very  long  time  if  they  were  done  by 
human  beings.  Using  silicon  compilers  can  speed  up  the  top-down  synthesis  process  considerably  since 
they  produce  the  mask  level  description,  straight  away,  from  the  functional  description  without  any 
human  intervention.  Certain  software  tools  known  as  design  rule  checkers  (DRC)  and  electrical  rule 
checkers  (ERC)  are  also  used.  These  perform  the  rather  mundane  tasks  of  checking  to  see  if  the  layout 
satisfies  all  the  design  rules  of  the  technology  and  whether  there  are  any  topological  faults  from  the 
electrical  point  of  view  such  as  a  floating  node,  and  a  short  between  power  and  ground.  There  is,  how¬ 
ever,  a  bottleneck  in  speeding  up  the  bottom-up  design  verification  process  which  is  in  the  simulation 
of  the  electrical  behavior  of  the  circuit.  This  bottleneck  is  due  to  the  unavailability  of  a  simulation 
tool  that  is  capable  of  accurately  predicting  the  performance  of  an  entire  VLSI  circuit  at  a  reasonable 
cost.  The  accuracy  of  a  simulator  is  important,  since  otherwise  the  integrated  circuit  which  is  fabri¬ 
cated  and  tested  might  turn  out  to  perform  rather  unsatisfactorily.  For  large  circuits  (typically  of  the 
kind  in  today's  VLSI  technology),  the  speed  of  simulation  is  equally  important  so  that  the  entire  circuit 
can  be  simulated  in  a  reasonably  small  amount  of  computation  time.  However,  as  we  shall  see  in 
Chapter  2  of  this  thesis,  speed  and  accuracy  of  a  simulator  are  often  conflicting  requirements  among 
existing  simulation  tools. 

In  this  dissertation  we  will  be  primarily  concerned  with  providing  a  fast  and  accurate  simulation 
tool  to  a  VLSI  circuit  designer  which  gives  adequate  information  on  the  performance  of  the  circuit 
with  a  reasonable  expenditure  of  computation  time  even  for  very  large  circuits.  In  Chapter  2  of  this 
thesis  we  will  review  some  of  the  existing  simulators  for  integrated  circuits  and  classify  them  into  two 
distinct  categories,  namely,  analog  simulators  and  digital  simulators.  Analog  simulators  treat  an  elec¬ 
tronic  circuit  as  a  continuous  dynamical  system  with  electrical  signals  such  as  voltages  and  currents. 


Digital  simulators,  on  the  other  hand,  view  the  circuit  as  a  digital  network  with  signals  occupying 
discrete  states  such  as  low  (0)  and  high  (1).  For  small  circuit  blocks  where  analog  voltage  levels  are 
critical  in  evaluating  circuit  performance,  or  where  strong  coupling  exists,  analog  circuit  simulators 
such  as  SPICE2  [l]  and  ASTAP  [2]  can  be  used  to  predict  the  performance  of  the  circuit  very  accurately. 
As  the  size  of  the  circuit  (number  of  components)  increases,  however,  using  these  simulators  is  no  longer 
cost-effective.  Several  decomposition  techniques  have  been  used  to  speed  up  their  performance  and 
have  resulted  in  a  new  generation  of  analog  simulators  [3-15]  which  are,  however,  cost-effective  for  cir¬ 
cuits  limited  to  at  most  ten  thousand  devices. 

The  existing  digital  simulators  [13-27]  can  be  further  divided  into  Boolean  gate-level  [13-18]  and 
switch-level  [19-27]  simulators.  In  the  Boolean  gate  model  a  circuit  consists  of  a  set  of  logic  gates  con¬ 
nected  by  unidirectional  memoryless  wires.  The  logic  gates  compute  Boolean  functions  of  their  input 
signals  and  transmit  these  values  along  the  wires  to  the  inputs  of  other  gates.  Each  gate  input  has  a 
unique  signal  source.  Information  is  only  stored  in  the  feedback  paths  of  sequential  circuits.  The 
Boolean  gate  model,  however,  cannot  describe  some  of  the  newer  technologies  currently  used  in  VLSI 
circuit  design,  especially  circuits  with  Metal-Oxide-Semiconductor  CMOS)  transistors.  The  MOS  transis¬ 
tor  can  be  treated  as  a  voltage-controlled  switch  with  three  terminals  :  drain,  gate,  and  source.  The  sig¬ 
nal  at  the  gate  terminal  controls  the  connection  between  drain  and  source  terminals.  Therefore,  some 
MOS  pass  transistor  networks  can  implement  combinational  logic  in  ways  that  resemble  relay  contact 
networks  more  closely  than  conventional  logic  gate  networks.  Dynamic  memories  using  MOS  devices 
can  store  information  without  feedback  paths  by  exploiting  the  capacitance  of  the  wires  (interconnect 
region)  and  the  gates  of  the  transistors  attached  to  them.  A  variety  of  bus  structures  can  provide  mul¬ 
tidirectional,  multipoint  communication.  Thus,  MOS  circuits  consist  of  bidirectional  switching  ele¬ 
ments  connected  by  bidirectional  wires  with  memory  due  to  the  interconnect  and  device  capacitances 
and  hence  cannot  be  modeled  accurately  by  Boolean  gate-level  simulators. 


A  new  class  of  digital  simulators  has  recently  emerged  specifically  for  simulating  MOS  VLSI  cir¬ 
cuits.  These  switch-level  [19-27]  simulators  model  an  MOS  circuit  as  a  set  of  nodes  connected  by 
transistor  switches.  Each  node  occupies  a  discrete  number  of  states  0,  1,  or  X  for  the  intermediate  or 
unknown  state  and  each  switch  is  either  open,  closed,  or  in  an  intermediate  or  unknown  state.  These 
simulators  can  handle  a  variety  of  MOS  configurations  such  as  logic  gates,  pass  transistors,  busses,  static 
and  dynamic  memory.  Digital  simulators,  in  general,  operate  at  sufficient  speeds  to  test  entire  VLSI 
systems,  since  the  circuit  behavior  is  modeled  at  a  logical  rather  than  a  detailed  electrical  level.  Howr- 
ever,  these  simulators  do  not  model  the  dynamics  of  the  circuits  properly  and  are  often  useful  only  in 
predicting  steady-state  responses  of  the  signals.  Analog  simulators,  on  the  other  hand,  predict  both 
steady-state  and  transient  responses  fairly  accurately,  if  the  device  models  used  are  accurate,  but  are 
cost-effective  only  for  circuits  with  less  than  a  few  thousand  components,  which  are  considered  small 
in  the  present  day  VLSI  technology. 

The  algorithms  presented  m  this  thesis  have  led  to  the  development  of  a  switch-level  timing 
simulator  for  MOS  VLSI  circuits.  This  simulator,  MOSTIM,  is  an  attempt  to  bridge  the  gap  between 
analog  and  digital  simulators.  It  performs  simulations  at  a  switch  level  and  hence  runs  at  speeds  close 
to  that  of  digital  simulators.  Further,  it  uses  a  delay  operator  to  delay  signal  transitions  accurately  and 
hence  provides  the  timing  accuracy  comparable  to  that  of  analog  simulators. 

MOSTIM  uses  3  strengths  and  3  states  to  represent  node  signal  values.  The  strengths  in  decreasing 
order  are  input,  pullup,  and  nonnaL  The  three  states  used  are  0,  u,  and  1,  with  0  and  1  representing 
the  conventional  low  and  high  signal  values  respectively  while  the  u  state  is  used  to  represent  inter¬ 
mediate  signal  values  and  sometimes  to  represent  situations  of  conflict.  The  input  to  MOSTIM  is  a 
transistor-level  circuit  description  in  a  SPICE2  input  format.  The  program  begins  by  partitioning  the 
entire  MOS  network  into  several  functional  blocks.  The  partitioning  is  an  automatic  process  that  is 
completely  transparent  to  the  user.  The  partitioned  blocks  are  then  ordered  for  processing  so  that, 
whenever  possible,  a  block  is  scheduled  for  processing  only  after  all  its  inputs  have  been  previously 


processed.  Since  this  is  not  possible  for  blocks  forming  feedback  loops,  a  novel  dynamic  windowing 
scheme  is  used  to  schedule  such  blocks.  The  blocks  are  then  processed  at  a  switch  level  producing  so- 
called  zero-delay  ternary  signal  waveforms.  These  zero-delay  waveforms  are  first  delayed  suitably  by 
the  delay  operator  and  then  filtered  to  produce  realistic  waveforms.  MOSTIM,  at  present,  handles 
only  n-channel  MOS  (NMOS)  circuits,  but  the  algorithms  presented  in  this  dissertation  can  be  easily 
extended  to  complementary  MOS  (CMOS)  circuits  as  welL 

In  Chapter  3,  the  algorithms  for  partitioning  the  input  network  into  various  blocks  and  the  order¬ 
ing  of  these  blocks  for  processing  are  discussed.  The  input  network  to  MOSTIM  is  assumed  to  consist  of 
voltage  sources,  NMOS  transistors  -  both  depletion  and  enhancement  types-  and  a  fixed  capacitance 
from  each  circuit  node  to  ground.  The  key  to  the  partitioning  strategy  is  to  divide  the  set  of  enhance¬ 
ment  transistors  into  driver  transistors  and  pass  transistors.  A  graph-theoretic  algorithm  achieves  this 
in  computation  time  linear  with  the  number  of  enhancement  devices.  The  driver  transistors  are  then 
grouped  together  to  form  multi  functioned  blocks  (MFB)  and  the  pass  transistors  are  grouped  together  to 
form  pass  transistor  blocks  (PTB).  A  third  type  of  block  called  input  source  (SRC)  is  created  from  the 
voltage  sources,  clocks  etc.  A  directed  graph  G  is  then  constructed  with  vertices  corresponding  to  the 
various  circuit  blocks,  namely,  MFB’s,  PTB’s,  and  SRC’s,  and  directed  arcs  describing  the  interconnec¬ 
tions  between  them.  A  modified  version  of  a  depth  first  search  known  as  Tarjan’s  algorithm  [31]  is 
used  to  detect  strongly  connected  components  ( SCC )  in  G.  The  vertices  within  an  SCC  correspond  to 
blocks  forming  feedback  loops  in  the  original  circuit  and  are  collapsed  into  single  vertices  thus  creating 
an  acyclic  reduced  graph  G.  The  vertices  of  G  are  then  placed  in  topological  order  for  processing. 

The  algorithms  for  the  switch-level  simulation  of  multifunctional  blocks  and  pass  transistor 
blocks  are  presented  in  Chapter  4.  An  MFB  is  a  single  output,  multiple  input,  unidirectional  block 
whose  steady-state  output  is  a  Boolean  function  of  its  inputs.  A  graphical  technique  using  internal- 
node  eliminations  is  used  to  evaluate  the  state  of  the  signal  at  the  output,  given  the  input  signal  states. 
No  attempt  is  made  to  evaluate  signals  at  the  internal  nodes  of  the  MFB.  In  the  switch  level  Simula- 


tion  of  a  PTB,  however,  the  signal  at  every  node  wit)  :n  the  PTB  is  evaluated.  The  transistors  in  a  PTB 
are  modeled  as  bidirectional  switches  whose  conduction  states  (Le,  open,  closed,  or  intermediate)  are 
controlled  by  the  signal  at  the  corresponding  gate  terminals.  A  strong  node  forces  its  state  on  a  weaker 
node  connected  to  it  via  a  path  of  conducting  transistors  at  any  given  time  instant.  The  algorithm  is 
quite  similar  to  the  one  used  in  conventional  switch  level  simulators  such  as  MOSSIM  [191  except  for 
the  interpretation  of  the  u  state  (or  X  state  as  used  in  MOSSIM). 

The  switch-level  simulation  algorithms  described  in  Chapter  4  generate  zero-delay  ternary 
waveforms  for  each  pull-up  node  in  an  MFB  and  each  normal  node  in  a  PTB.  A  delay  operator, 
described  in  Chapter  5,  is  used  to  delay  pairs  of  complete  transitions  (Le„  0— *u  followed  by  n-*l,  or 
l-*u  followed  by  u-*0)  in  the  zero-delay  waveforms.  The  delay  operator  computes  appropriate  delay 
values  by  taking  several  parameters  into  account,  such  as  block  configuration,  loading,  device 
geometries,  and  input  slew  rates.  For  NMOS  technology,  knowing  the  delay  characteristics  of  five  dif¬ 
ferent  circuit  primitives  is  sufficient,  within  reasonable  limits  of  accuracy,  to  compute  delays  through 
any  general  MFB  or  PTB.  These  five  primitives  are  simulated  using  an  accurate  circuit  simulator  such 
as  SP1CE2  [ll  or  SLATE  [3l  for  various  device  and  circuit  parameters,  and  the  delay  values  are 
extracted  and  stored  in  a  delay  table.  This  can  be  done  in  a  presimulation  phase.  During  simulation, 
MOSTTM  then  maps  an  MFB  or  a  PTB  into  one  of  the  five  primitives  and  obtains  the  appropriate  delay 
value  through  fast  table  lookup  methods,  and  interpolation  when  necessary.  Clearly,  the  delay  values 
are  functions  of  various  circuit  and  device  parameters.  However,  using  time  scaling  techniques,  it  will 
be  shown  that,  only  one  parameter,  namely,  the  input  slew  rate,  is  sufficient  for  determining  delays  in 
three  of  the  five  primitives.  The  effect  of  the  rest  of  the  parameters  can  be  accounted  for  by  using  cer¬ 
tain  scale  factors.  For  the  remaining  two  primitives,  however,  there  are  three  parameters  necessary  to 
obtain  delay  values.  Thus,  time  scaling  helps  reduce  the  size  of  the  delay  tables  considerably. 

In  Chapter  6  we  discuss  techniques  used  to  process  blocks  within  an  SCC.  In  order  to  perform  a 
switch  level  simulation  of  a  block  (MFB  or  PTB),  the  waveforms  at  the  input  nodes  to  the  blocks  must 


necessarily  be  known.  Since  this  is  not  possible  for  blocks  within  a  SCC,  these  have  to  be  handled 
separately.  A  waveform  relaxation  technique  could  be  used,  wherein  the  blocks  are  processed  itera¬ 
tively  in  a  predetermined  order  with  unknown  input  waveforms  initially  relaxed  and  output 
waveforms  constantly  updated.  Several  drawbacks  of  this  technique  will  be  discussed.  A  new 
dynamic  windowing  method  which  overcomes  most  of  these  drawbacks  will  be  presented.  In  principle, 
this  new  scheme  is  quite  similar  to  the  classical  event-driven  time-wheel  approach  used  in  conventional 
logic  simulators  [l  3,191  except  that  events  take  place  during  intervals  of  time  instead  of  occurring 
instantaneously.  The  entire  time  interval  of  analysis  is  automatically  partitioned  into  variable  size 
windows  such  that  the  signal  at  each  node  in  each  block  within  the  SCC  occupies  a  steady  state  (i-e,  0 
or  1)  at  the  window  boundaries.  Associated  with  each  window'  is  a  set  of  blocks  scheduled  for  process¬ 
ing  during  that  window.  This  new  scheme  does  not  require  an  a  priori  ordering  of  blocks  within  the 
SCC,  and  is  also  seen  to  take  less  computation  time  and  less  storage. 

A  number  of  XMOS  circuits  have  been  simulated  using  MOSTIM.  The  performance  is  discussed 
in  Chapter  7.  In  all  the  circuits  simulated  thus  far,  MOSTIM  provides  timing  information  with  an 
accuracy  of  within  10%  of  that  provided  by  SP1CE2  [l],  at  approximately  two  orders  of  magnitude  fas¬ 
ter  in  simulation  speed.  The  performance  is  also  compared  with  some  of  the  recent  attempts  made  in 
switch  level  timing  simulation  such  as  RSIM  [26].  Finally,  in  Chapter  8,  we  provide  some  conclusions 
along  with  some  suggestions  for  future  research. 
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CHAPTER  2 


OVERVIEW  OF  SIMULATION  TECHNIQUES 


Simulation  plays  a  major  role  in  the  process  of  designing  an  integrated  electronic  circuit.  By  using 
a  simulator,  the  circuit  designer  can  evaluate  the  performance  of  the  design  before  going  into  the  expen¬ 
sive  and  time-consuming  manufacturing  process.  There  are  two  basic  approaches  to  simulating  an 
integrated  electronic  circuit.  The  first,  and  more  traditional  approach  is  to  treat  the  circuit  as  a  con¬ 
tinuous  dynamical  system  and  obtain  a  set  of  nonlinear  algebraic-differential  equations  with  electrical 
variables  such  as  voltage,  current,  and  charge  to  describe  its  behavior.  The  objective  of  an  analog  simu¬ 
lator  is  to  solve  this  set  of  equations,  numerically,  and  obtain  the  detailed  waveforms  at  various  nodes 
in  the  circuit.  An  alternate  approach  is  to  view  the  circuit  as  a  digital  system  in  which  the  signals 
occupy  discrete  states.  Since  the  majority  of  VLSI  circuits  are  primarily  digital  in  nature,  digital  simu¬ 
lators  are  often  successful  in  predicting  steady-state  responses  in  these  circuits.  Analog  simulators  are 
generally  quite  accurate  in  evaluating  the  performance  of  circuits,  but  are  not  fast  enough  to  handle 
entire  VLSI  circuits.  Digital  simulators,  on  the  other  hand,  are  able  to  simulate  very  large  circuits,  but, 
unfortunately,  are  not  accurate  in  modeling  the  dynamics  in  these  circuits. 


2.1  Analog  Simulation 

For  small  circuit  blocks  where  analog  voltage  levels  are  critical  to  determine  circuit  performance, 
or  where  strong  coupling  exists,  circuit  simulators  such  as  SPICE2  [l]  and  ASTAP  [2]  can  be  used  to  pro¬ 
vide  accurate  information  on  the  behavior  of  the  circuit.  These  simulators  will  be  referred  to  as  stan¬ 
dard  circuit  simulators.  These  are  general  purpose  simulators  in  that  they  can  handle  almost  any  type 
of  circuit  element  such  as  resistors,  capacitors,  inductors  (both  self  and  mutual),  voltage  and  current 


sources  (independent  and  controlled),  nonlinear  devices  (transistors,  diodes,  etc.),  and  transmission  lines. 
They  can  also  perform  many  types  of  analyses  such  as  dc  analysis,  ac  (or  small-signal)  analysis,  noise 
analysis,  and  transient  or  time-domain  analysis.  In  present  day  IC  design,  however,  standard  circuit 
simulators  are  primarily  used  for  time-domain  transient  analysis,  which  happens  to  be  the  most  compli¬ 
cated  and  expensive  type  of  analysis. 

The  transient  analysis  of  a  circuit  involves  the  solution  of  a  system  of  nonlinear  algebraic- 
differential  equations  describing  the  analog  behavior  of  the  circuit.  Standard  circuit  simulation 
involves,  essentially,  three  basic  numerical  methods  in  solving  the  circuit  equations: 

1.  An  implicit  integration  method  which  approximates  the  time-derivative  operator  in  the  system  of 
differential  equations  with  a  divided  difference  operator.  The  circuit  equations  are  thus 
transformed  into  a  sequence  of  nonlinear  algebraic  difference  equations. 

2.  The  Newton-Raphson  algorithm  for  solving  the  sequence  of  nonlinear  equations,  iteratively,  by 
generating  a  set  of  linear  algebraic  equations. 

3.  The  Gaussian  elimination  method  for  finding  the  solution  of  a  system  of  linear  algebraic  equa¬ 
tions. 

The  circuit  simulator  SPICE2  uses  the  Modified  Nodal  Method  (MNA)  [32]  to  formulate  the  cir¬ 
cuit  equations,  whereas  ASTAP  uses  the  Sparse  Tableau  [33]  approach.  In  either  case,  the  time  Tf  spent 
by  the  simulator  to  formulate  the  circuit  equations  grows  almost  linearly  with  the  size  of  the  circuit. 
However,  the  time  Ts  required  to  solve  these  equations  increases  at  a  faster  rate  and  rapidly  becomes  the 
dominant  cost  of  analysis.  Moreover,  most  of  T,  is  spent  in  the  Gaussian  elimination  process  which 
involves  the  solution  of  a  matrix  equation  of  the  form  Ax=b,  where  A  is  the  circuit  Jacobian  matrix,  x 
is  a  vector  of  unknown  circuit  variables  and  b  is  a  known  source  vector.  In  a  typical  large  scale  circuit, 
the  matrix  A  is  usually  very  sparse  (ke„  it  has  very  few  nonzero  elements).  Hence,  the  Gaussian  elimi¬ 
nation  in  standard  circuit  simulators  is  usually  implemented  by  using  sparse  matrix  methods  [34].  It  is 
important  to  exploit  the  sparsity  of  the  matrix  A,  since  the  computational  time  required  to  perform 
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Gaussian  elimination  of  a  full  nXn  matrix,  using  Crout’s  algorithm  [34],  is  proportional  to  n3  (theoreti¬ 
cally,  better  algorithms  exist  with  smaller  exponents  [67]).  In  digital  circuits,  however,  using  sparse 
matrix  techniques  [l],  the  Gaussian  elimination  has  been  empirically  shown  to  take  computational  time 
that  is,  on  an  average,  proportional  to  n“,  where  <*€[1.2,1.51. 

SPICE2  and  ASTAP  have  proven  to  be  reliable  and  effective  when  the  size  of  the  circuit,  meas¬ 
ured  by  the  number  of  components,  is  smalL  As  the  size  of  the  circuit  increases,  the  computer  time  and 
storage  space  used  up  by  these  simulators  increase  rapidly  despite  the  use  of  sparse  matrix  techniques. 
In  particular,  the  time  Ts  required  to  solve  the  circuit  equations  exhibits  a  nonlinear  increase  with  cir¬ 
cuit  size.  In  SPICE2,  Ts  is  less  than  10%  of  the  total  computation  time  for  a  circuit  with  less  than  30 
nodes  but  reaches  almost  half  the  total  time  for  a  circuit  with  a  thousand  nodes  [13].  The  problem  is 
further  aggravated  by  the  fact  that  for  larger  circuits,  more  information  is  generally  needed  to  verify 
the  circuit  performance,  and  hence,  longer  simulation  times  are  required.  It  has  been  estimated  that  the 
simulation  of  a  circuit  with  around  10,000  MOS  transistors  from  t-0  to  t» 1000ns,  using  SPICE2  on  an 
IBM  370/168  Computer,  would  take  at  least  30  hours  of  CPU-time  [55].  Since  30  hours  is  clearly 
prohibitive,  the  cost-effective  use  of  standard  circuit  simulators  is  limited  to  circuits,  with  less  than  a 
few  hundred  components,  which  are  considered  small  in  the  present  day  VLSI  technology. 
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2.2  Decomposition  Techniques  for  Analog  Simulation 

Several  attempts  have  been  made  to  speed  up  the  performance  of  standard  circuit  simulators. 
This  resulted  in  the  development  of  a  variety  of  analog  simulators  such  as  SLATE  [3l  MACRO  [4l 
MOTTS  [51  MOTIS-C  [6\  PREMOS  [7],  RELAX.  [10],  SPLICE  [131  DIANA  [141  and  SAMSON  [15], 
These  nonstandard  analog  simulators  can  be  meaningfully  classified  according  to  the  decomposition 
techniques  employed  by  them,  in  order  to  achieve  the  improvement  in  speed.  Decomposition  refers  to 
any  technique  that  subdivides  the  original  problem  into  several  subproblems.  Each  subproblem 
corresponds  to  solving  only  a  subset  of  the  original  system  equations  for  a  subset  of  system  variables. 


Decomposition  can  be  applied  at  any  of  the  three  levels  of  the  standard  circuit  simulation  approach, 
namely,  the  differential  equation  level  (or  sometimes  called  the  time  level),  the  nonlinear  algebraic 
equation  level,  or  the  linear  algebraic  equation  level.  The  original  system  of  equations  is  viewed  by  a 
decomposition  technique,  no  matter  at  what  level  it  is  applied,  as  a  composition  of  several  subsystems 
with  interactions  among  them.  Each  subsystem  is  usually  solved  in  a  maimer  similar  to  the  conven¬ 
tional  techniques  used  in  standard  circuit  simulators.  Hence,  the  main  feature  of  a  decomposition  tech¬ 
nique  is  the  handling  of  the  interactions  between  the  various  subsystems. 

The  majority  of  large  integrated  circuits  are  digital  in  nature,  and  hence,  several  properties  of 
such  circuits  can  be  exploited  during  the  simulation  process.  Digital  circuits  tend  to  be  structurally  reg¬ 
ular  and  repetitive.  A  typical  large  digital  circuit  is  usually  composed  of  a  number  of  small  subcir¬ 
cuits,  normally  referred  to  as  logic  gates.  Several  of  these  logic  gates  are  functionally  and  topologically 
the  same,  and  thus  analyzing  one  is  very  similar  to  analyzing  the  others.  Furthermore,  only  a  small 
fraction  of  the  circuit  variables  is  actively  changing  state  at  any  time  instant  in  a  large  digital  circuit. 
For  circuits  containing  over  1000  transistors,  typically  more  than  80%  of  the  circuit  variables  are 
steady  (not  changing)  at  any  given  time  instant.  As  the  size  of  the  circuit  increases,  the  fraction  of 
active  (changing)  circuit  variables  tends  to  fall  even  further.  This  inactivity,  or  latency,  in  a  large 
digital  network  can  be  exploited  by  an  analog  simulator  in  a  number  of  ways.  The  main  advantages  in 
using  decomposition  techniques  are 

1.  The  structural  regularity  and  repetitivity  of  the  subsystems  can  be  exploited. 

2.  Incorporating  bypassing  schemes  at  several  levels  to  exploit  the  latency  of  a  subsystem  can  result 
in  additional  savings  in  computing  time. 

3.  Decomposition  techniques  are  suitable  for  computers  with  parallel  or  pipeline  architectures  since 
two  or  more  subsystems  can  be  solved  concurrently. 

There  are  two  different  approaches  to  achieving  system  decomposition,  namely,  tearing  and  relax¬ 
ation  [36].  These  two  approaches  are  characterized  by  different  ways  of  updating  the  interactions 


between  subsystems  and  bv  different  numerical  properties.  The  tearing  approach  aims  to  retain  the 
same  numerical  convergence  and  stability  properties  as  of  the  standard  circuit  simulation  approach, 
while  the  relaxation  methods  (also  called  temporal  or  indirect  methods)  have  completely  different 
numerical  properties. 

2.2.1  Tearing  Decomposition 

Solving  a  network  by  tearing  decomposition  is  an  approach  in  which  a  part  of  the  network  is  tom 
away,  so  that  the  remaining  subnetworks  are  disconnected  and  thus  can  be  analyzed  independently. 
The  solutions  of  the  individual  subnetworks  are  then  combined  with  those  of  the  torn-awav  part  of  the 
netw'ork  in  order  to  obtain  the  solution  of  the  entire  network.  There  are  basically  two  types  of  tearing, 
namely,  node-tearing  and  branch-tearing  depending  upon  whether  circuit  nodes  or  branches  are 
removed  to  tear  down  the  network.  The  program  SLATE  [3]  utilizes  the  node-tearing  approach  at  the 
linear  equation  level.  The  LU-factorization  of  the  original  Jacobian  matrix  during  the  standard  Gaus¬ 
sian  elimination  process  is  performed  by  cleverly  exploiting  the  block  structure  of  the  matrix  reordered 
in  a  special  form,  thus  achieving  savings  in  computation  time.  Another  approach  is  to  decompose  the 
system  at  the  nonlinear  equation  level  by  introducing  additional  iteration  loops  in  the  standard 
Newton’s  method.  This  multilevel  Newton  method  is  used  in  MACRO  [4].  Tearing  methods,  in  gen¬ 
eral,  are  well-suited  for  parallel  processing  and  retain  the  numerical  convergence  and  stability  proper¬ 
ties  of  the  standard  approach- 

2.2.1.1  Tearing  of  Linear  Systems 

At  the  linear  equation  level,  tearing  is  used  to  solve  a  set  of  linear  algebraic  equations  of  the  form 

Ax=b  (2.1) 

where  A  is  an  nxn  matrix,  x  is  an  unknown  vector  and  b  is  a  known  vector  in  R". 


The  standard  Gaussian  elimination  process  involves  the  LU-factorization  of  A  such  that  A=LU. 
where  L  and  U  are  lower  triangular  and  upper  triangular  matrices  respectively.  In  general,  we  have 
PA=LU,  where  P  is  a  permutation  matrix.  This  is  followed  by  a  forward  substitution  step  wherein  a 
temporary  vector  y  is  first  computed  from 

Ly=b  (2.2) 

after  which  x  is  computed  in  the  backward  substitution  step  from 

Ux=y.  (2.3) 


In  case  the  permutation  matrix  P  is  not  the  identity,  then  we  can  replace  the  known  vector  b  by  the 
vector  Pb  in  Equation  (2.2).  It  must  also  be  noted  that  Equations  (2.2)  and  (2.3)  can  be  solved  without 
explicitly  computing  matrix  inverses  since  the  corresponding  matrices  are  triangular.  However,  as  the 
size  of  the  matrix,  n,  becomes  large,  even  Gaussian  elimination  turns  out  to  be  prohibitively  expensive. 


Algebraically,  tearing  can  be  considered  as  reordering  the  network  variables  such  that  Equation 
(2.1)  has  a  bordered  block  diagonal  (BBD)  form 
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(2.4) 


where  w  €  Rk  is  the  vector  of  tearing  variables  and  v  €  Rm  is  the  vector  of  the  remaining  unknown 
variables.  T  is  a  kxk  tearing  matrix  corresponding  to  the  variables  in  w.  Removal  of  the  variables  in 
w  tears  the  network  into  fi  independent  subnetworks.  D  is  an  mXm  block  diagonal  matrix 
corresponding  to  these  subnetworks.  Assuming  that  the  i1*  subnetwork  has  m;  variables  and  the 
miXsii  matrix  corresponding  to  this  is  Dit  we  then  get  the  following  partition  : 


Further  PT  =  [  P,T  P2T  . .  PJ  3  and  QT  =  t  QiT  Q2T  •  •  Qj  1  where  Pj  and  Qj  are  niiXk  matrices 


constituting  the  border. 


The  solution  strategy  is  to  first  eliminate  the  variables  v  from  the  system  resulting  in  the  follow¬ 
ing  reduced  subsystem : 

(T  -  QTD-*P)w  =  s  -  QTD-1y.  (Z5) 

Solving  (2-5)  gives  the  tearing  variables  w,  after  which  the  i*  subnetwork  can  be  solved  to  yield 

Vj  as 

=  JTi  -  Pjw  (2.6) 

for  each  i  =  1,2, _ ,/i . 

It  must  be  noted  that  both  Equations  (2-5)  and  (2.6)  represent  subproblems  much  smaller  than  the 
original  problem  since,  typically,  k«n  and  m,  «n  .  Further,  these  equations  can  be  solved  without 
actually  inverting  any  of  the  matrices  involved.  The  details  are  given  in  [35]  and  will  not  be  discussed 
here.  Parallel  processors  could  be  employed  to  solve  Equation  (2.6)  for  different  subnetworks.  Thus 
tearing  aids  in  saving  computation  time  over  Gaussian  elimination  of  a  rather  large  system  of  linear 
equations. 


2.2. 1.2  Tearing  of  Nonlinear  Systems 

At  the  nonlinear  equation  level,  tearing  is  applied  in  the  multilevel  Newton  iteration  procedure 
used  in  MACRO  [4].  In  this  approach  the  circuit  is  assumed  to  be  described  in  a  hierarchical  fashion. 
In  a  two-level  hierarchy,  a  circuit  is  composed  of  certain  functional  units,  called  blocks.  Each  block  is  a 
small  subnetwork  consisting  of  basic  circuit  elements  such  as  transistors,  resistors,  and  capacitors.  The 
circuit  variables  in  a  block  are  divided  into  two  categories,  namely,  endogenous  -  those  that  interact 
only  with  variables  inside  the  block,  and  exogenous  -  those  that  also  interact  with  variables  outside  the 
block.  Let  u  €  Rk  denote  the  exogenous  variables  for  a  subcircuit.  The  endogenous  variables  are,  in 
turn,  partitioned  into  two  sets.  The  first  set,  called  the  output  variables,  and  denoted  by  y  €  Rk,  are  in 


1-1  correspondence  with  the  exogenous  variables.  For  example,  if  the  exogenous  variables  are  chosen  to 
be  node  voltages,  then  the  set  of  output  variables  will  be  branch  currents  entering  the  subcircuit  from 
these  nodes.  The  second  set,  denoted  by  x  €  Rm.  is  the  set  of  internal  variables. 


The  static  behavior  of  each  subcircuit  can  be  determined  by  solving  a  system  of  equations  of  the 

form 

H(uA<y)  =  0.  (2.7) 

Given  u,  the  interaction  of  the  subcircuit  with  the  rest  of  the  circuit  is  completely  described  by  y. 
Thus  Equation  (2.7)  can  be  solved  to  yield  an  exact  macromodel  for  the  subcircuit,  which  is  a  mapping 
from  u  to  y.  Therefore  the  original  circuit  can  be  treated  as  composed  of  black  boxes  whose  input- 
output  behavior  is  modeled  by  macromodels,  leading  to  the  network  equations  of  the  form 

Ftu,y,w)  =  0  (2.8) 

where  w  €  Rp  is  a  vector  of  network  variables  not  interacting  with  any  of  the  subcircuits. 

The  two- level  \ewton-Raphson  algorithm  can  then  be  described  as  follows.  Each  subcircuit  hav¬ 
ing  equations  of  the  form  of  Equation  (2.7)  is  first  solved  using  a  Newton-Raphson  iterative  technique 
yielding  y  as  a  function  of  u  denoted  by  y  =  G(u) .  The  next  level  of  Newton-Raphson  iterations  is 
applied  to  Equation  (2.8)  with  y  =  G(u)  to  yield  the  complete  solution  to  the  network. 

The  two-level  technique  can  easily  be  extended  to  many  levels  of  hierarchy  in  the  circuit  and  is 
extremely  useful  if  circuits  are  described  in  a  multilevel  hierarchical  fashion.  The  main  advantage  in 
using  this  approach  is  that,  at  each  level,  the  Newton-Raphson  algorithm  is  applied  only  to  a  relatively 
small  number  of  equations,  thus  gaining  computational  speed.  Like  other  tearing  methods,  this  scheme 
permits  individual  subcircuits  to  be  processed  in  parallel  while  still  retaining  the  essential  properties  of 
the  corresponding  standard  technique,  which  in  this  case  is  the  quadratic  convergence  of  the  Newton- 
Raphson  method. 


.N  .%  . 


2.2.2  Relaxation  Decomposition 

Relaxation  or  temporal  decomposition  techniques  are  used  by  several  nonstandard  analog  simula¬ 
tors  such  as  MOT1S  [5],  SPLICE  [13],  RELAX  [lOl  and  SAMSON  [15],  to  achieve  higher  computational 
speeds.  Relaxation  can  also  be  applied  at  any  of  the  three  levels  of  the  standard  circuit  simulation 
approach,  namely,  the  linear  equation  level,  the  nonlinear  equation  level,  and  the  differential  equation 
leveL  These  methods  are  characterized,  however,  by  completely  different  numerical  convergence  and 
stability  properties. 

2.2.2. 1  Relaxation  of  Linear  Systems 

As  in  Section  2.2.1.1,  suppose,  once  again,  that  the  linear  system  of  equations  to  be  solved  is  of  the 
form  Ax  =  b  where  x,  b  €  Rn  ,  and  A  is  an  nXn  matrix.  There  are  two  well-known  relaxation  tech¬ 
niques  that  could  be  used  to  solve  the  above  system  iteratively.  These  are  the  Gauss-Jacobi  method  and 
the  Gauss-Seidel  method.  Both  these  methods  are  iterative  in  nature,  as  are  relaxation  methods  in  gen¬ 
eral,  and  generate  a  sequence  of  vectors  x°,  x\  x2,  •  •  • ,  x‘,  xi+l,  •  •  •  where  x°  is  some  initial  guess. 
This  sequence  converges  to  a  solution  x*  for  any  initial  guess,  provided  some  conditions  involving  the 
matrix  A  are  met.  In  this  case  the  iterations  stop  when  the  error  8i+1  =  ||  x'+1  —  x*  ||  <  €  where  e>0 
is  preassigned. 

The  relaxation  begins  by  partitioning  A  as 

A  =  L+D+U  (2.9) 

where  L  and  U  are  strictly  lower  and  strictly  upper  triangular  matrices  and  D  is  a  purely  diagonal 
matrix.  Thus  the  original  system  of  equations  can  written  as 

Dx  =  b  —  Lx  —  Ux.  (2.10) 

The  Gauss-Jacobi  method  then  computes  xi+1  f rom  x*  as 


xi+i  _  D-i(b  _  (L+ukO 


(2.11) 


while  the  Gauss-Seidel  computes 

xi+i  =  D-i(b  _  Lji+t  _  uji),  (2.12) 

More  precisely,  Gauss-Seidel  computes  the  j^1  component  of  x‘+1  as  j  is  incremented  from  1  to  n  as  fol¬ 
lows  : 

UjkX,*)  (2.12a) 

k=l  k=j+l 

since  Ljk  =  0  for  k^j  and  Ujn  =  0  for  k<  j  by  definition. 

From  Equation  (2.11)  one  gets 

(xi+I  -  x')  =  -D-HL+UXx4  -  xi-1) 
a,.d  hence,  for  the  Gauss- Jacobi  method 

8‘+l  ^  ||D~I(L+U)||  S'  (2.13) 

by  definition  of  the  induced  norm  of  a  matrix  [38].  Similarly,  for  the  Gauss-Seidel  method  one  gets 

5i+1  <  ||(L+D)-1U|j  8*  (2.14) 

In  either  case,  we  have  8*'H  ^  l|M||  S'  where  M  denotes,  generically,  the  matrices  involved  in  Equa¬ 

tions  (2.13)  and  (2.14).  From  the  above  equations,  it  can  be  shown  that  these  relaxation  methods  have 
the  following  properties : 

a)  The  iterations  converge  (i.e.  S'-*0  as  i-*oo  )  for  any  initial  guess  x°  if  and  only  if  I  X(M)  I  <  1  for 
each  eigenvalue  X  of  M. 

b)  rhe  iteration  converges  in  one  step  if  the  rows  and  columns  of  A  are  permuted  such  that  U  is 
identically  zero. 

c)  Speed  of  convergence,  in  most  cases,  is  improved  if  A  is  permuted  into  nearly  lower  triangular 


form. 


d’>  In  general,  convergence  depends  on  the  numerical  properties  of  L,  D,  and  U.  Convergence  is  typi¬ 
cally  rapid  for  the  first  few  iterations,  and  then  gets  progressively  slower.  The  asymptotic  rate  of 
convergence  is  linear. 

e)  The  speed  of  convergence  of  the  Gauss-Seidel  method  is  generally  faster  than  that  of  the  Gauss- 
Jacobi  method. 

The  advantage  of  the  Gauss-Seidel  method  is  that  at  each  iteration  only  a  triangular  system  of 
equations  has  to  be  solved.  Moreover,  considerable  improvement  in  speed  of  convergence  can  usually  be 
achieved  if  A  can  be  permuted  into  a  form  which  is  nearly  triangular.  The  disadvantage  of  this 
method  is  its  weak  convergence.  In  some  cases,  if  convergence  is  achieved,  it  is  only  linear.  Thus  if  M 
has  an  eigenvalue  of  modulus  close  to  1,  it  may  take  many  iterations  to  reduce  the  error  by  an  order  of 
magnitude.  If  A  is  diagonally  dominant,  which  implies  that  all  eigenvalues  of  M  have  modulus 
strictly  less  than  1,  then  convergence  is  guaranteed. 

2.1.2.2  Relaxation  of  Nonlinear  Systems 

Relaxation  methods  to  solve  nonlinear  difference  equations  are  used  in  a  class  of  analog  simula¬ 
tors,  known  as  timing  simulators  [5-8].  The  algorithms  used  in  these  simulators  depart  radically  from 
the  methods  used  in  standard  circuit  simulators  in  a  number  of  ways  ;  some  of  which  are 

1 )  The  types  of  networks  are  restricted  to  circuits  containing  only  MOS  transistors  and  lumped  capa¬ 
citors  from  each  node  to  ground. 

2)  The  nonlinear  device  characteristics,  in  most  cases,  are  stored  in  tables,  and  are  not  evaluated 
analytically  during  simulation. 

3)  Both  sparse  Gaussian  Elimination  and  conventional  \ewton-Raphson  techniques  are  discarded  as 
solution  methods  and  some  accuracy  may  be  sacrificed  in  the  quest  for  speed. 


The  first  timing  simulator  to  be  implemented  was  MOT1S  [51  which,  in  fact,  is  still  considered  a 
landmark  in  the  Computer-Aided  Design  (CAD)  area.  The  original  MOT1S,  as  implemented,  had  some 
problems  with  accuracy,  convergence,  and  coupling  such  as  floating  capacitors  (Le,  a  capacitor  across 
two  nodes).  Several  simulators  such  as  MOT1S-C  [6l  SPLICE  [l  3l  and  MOTIS-II  [7]»  were  implemented 
subsequently  to  overcome  some  of  these  problems.  To  elucidate  some  of  the  ideas  used  in  these  simula¬ 
tors,  assume  that  the  nodal  equations  of  an  MOS  network  are  of  the  form 

Cv  +  J(v)  =  0  (2.15) 

where  v  €  Rm  is  the  vector  of  node  voltages  as  a  function  of  time,  v  is  its  time  derivative,  C  is  the 
capacitance  matrix,  and  J(v)  is  the  vector  of  currents  feeding  the  capacitors.  Using  the  Backward  Euler 
method  to  discretize  the  time  derivative  operator,  we  get 

v°'M  =  (vn+1  -  v")/h„  (2.16) 

where  vk  is  the  value  of  v  computed  at  time  tk,  and  hk  =  tk+1  —  tk  .  .Assuming  that  the  values  of  v 
have  been  computed  at  time  points  t„  •  •  • ,  t„,  we  now  develop  the  procedure  to  evaluate  vn+l  . 
Substituting  Equation  (2.16)  into  Equation  (2.15)  and  denoting  the  unknown  variable  vn+1  by  y,  we 
get 


Cy  +  hnJ(y)  -  Cvn  =  0, 


(2.17) 


which,  in  general,  can  be  rewritten  as  a  system  of  nonlinear  equations  of  the  form 

gi<yi>y2.  •  •  *  .ym)  =  o 
g2<yi»y2»  •  •  *  ,yJ  =  o 

(2.18) 

gm^yi.yz.  •  •  •  .ym>  =  o 


The  relaxation  techniques  used  to  solve  the  above  equations  are  often  termed  as  point-wise  relaxa¬ 
tion  methods  as  opposed  to  waveform  relaxation  methods  [9l  wherein  the  relaxation  is  applied  at  the 
differential  equations  level  itself.  The  point-wise  relaxation  techniques  solve  equations  in  (2.18)  by 
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sweeping  one  equation  at  a  time  and  solving  for  one  variable  at  a  time  while  relaxing  the  remaining 
variables  to  their  previous  values.  The  process  is  repeated  until  the  unknown  variables  converge  or  the 
iteration  count  exceeds  a  preset  value.  In  MOTIS  a  Gauss-Jacobi-like  scheme  is  used  to  solve  equations 
in  (2.18)  approximately  by  obtaining  y;  from  the  following  scalar  equation  : 


gi(v“.v2°.  *  *  *  '  *  *  .v£)  =  0 


(2.19) 


It  must  be  pointed  out  that  the  above  nonlinear  scalar  equation  could  be  solved  using  a  Newton- 
Raphson  iterative  procedure.  In  MOTIS,  however,  the  solution  is  taken  to  be  the  value  obtained  after 
the  first  iteration  itself.  Furthermore,  the  values  of  y;  obtained  after  the  first  sweep  of  the  equations 
in  (2.18)  are  taken  to  be  the  values  of  vin+1  and,  once  again,  the  iterations  are  not  carried  out  until  con¬ 
vergence.  Thus  the  algorithms  in  MOTIS  compute  a  vector  y  which  solves  the  equations  in  (2.18) 
approximately,  and  sets  vn+1  =  y.  These  approximations  are  justified  when  sufficiently  small  time 
steps  are  taken  to  discretize  the  equations  in  (2.17). 

The  MOTIS-C  program  [6]  modifies  the  procedure  used  in  MOTIS  by  using  a  Gauss-Seidel-like 


approach,  which  computes  y;  from  the  following  equation  : 

iVYl  *v2  *  »vi— 1  »vm' 


(2.20) 


Once  again,  the  above  nonlinear  scalar  equation  is  solved  only  approximately  by  stopping  after  a  single 
Newton-Raphson  step.  Furthermore,  only  a  single  relaxation  sweep  is  taken  through  the  equations  in 

(2.18) .  In  SPLICE  [131  this  approach  is  modified  by  repeatedly  sweeping  through  the  equations  in 

(2.18)  until  convergence  is  achieved  or  until  the  number  of  iterations  equations  exceeds  an  a  priori 
bound,  in  which  case,  the  time  step  h„  is  reduced  and  the  process  is  repeated.  The  advantage  of  using  a 
Gauss-Seidel-like  approach  over  a  Gauss-Jacobi-type  approach  used  in  MOTIS,  is  that,  usually,  the 
Gauss-Seidel  iterations  converge  more  rapidly. 

The  program  PREMOS  [8]  uses  a  modified  Gauss-Seidel  predictor  algorithm  for  the  solution  of 
equations  in  (2.18).  In  this  approach,  while  solving  the  i1*1  equation  for  the  variable  y(,  the  previous 


variables  are  updated,  Le,  Yj  =  'vJn+1  for  j<i,  while  the  variables  with  j>i  are  predicted  by 
yj  =  vf  +  (Vjn-vj“-»)hn_,/hn_2.  Among  all  the  various  time-point  relaxation  methods  discussed 
above,  the  Gauss-Seidel,  with  prediction,  is  seen  to  perform  the  best,  provided  sufficiently  small  time- 
steps  are  taken.  Also,  experience  with  SPLICE  [13]  and  MOT1S  [5]  has  shown  that  repeated  iteration 
sweeps  are  required  in  order  to  achieve  accuracy.  The  convergence  and  stability  properties  of  these 
methods  are  studied  in  some  detail  in  [36]. 

2.2.23  Relaxation  of  Differential  Equations 

In  this  section  we  discuss  a  technique  in  which  relaxation  is  applied  directly  to  the  system  of  non¬ 
linear  algebraic-differential  equations  describing  the  circuit.  As  a  result,  the  system  is  decomposed  into 
several  decoupled  subsystems  of  nonlinear  algebraic-differential  equations,  each  of  which  can  then  be 
solved  using  standard  techniques,  namely,  stiffly  stable,  implicit  numerical  integration  methods, 
\ewton-Raphson  iterations,  and  sparse  Gaussian  elimination.  Furthermore,  this  type  of  decomposition 
allows  the  latency  of  the  subsystems  to  be  exploited  in  the  most  natural  way.  This  relaxation  tech¬ 
nique  is  called  the  Waveform  Relaxation  Method  (WRM)  [9]  and  is  used  in  the  simulator  RELAX  [lOj. 

In  order  to  describe  the  WRM  process,  consider  the  nonlinear  algebraic-differential  equations 
describing  the  behavior  of  any  general  circuit  to  be  of  the  form 

f(x(t),x(t),u(t))  =  0  (2.21a) 

E(x(0)  -  xj  =  0  (2.21b) 

where  t  €  [0,T]  is  the  independent  time  variable,  x(t)€Rp  is  the  vector  of  unknown  variables  at  time 
t,  x(t)  is  the  time  derivative  of  x  at  time  t,  u(t)€Rr  is  the  vector  of  input  variables  at  time  t,  Xfl€Rp  is 
the  given  initial  value  of  x,  f :  RpxRpxRr-*Rp  is  a  continuous  function,  and  E6Rnxp  is  a  matrix  of 
rank  n^p,  such  that  Ey(t)  is  the  state  of  the  circuit  at  time  t.  Alternatively,  the  vector  function  x(t), 
t  €  [0,T]  can  be  treated  as  an  element  x  in  the  vector  space  of  bounded  functions  LJ£[0,Tl  with  the 


norm  defined  as 


max 

j=i^.  -- 

where  zlfz2,  •  •  ♦  ,Zp  are  the  scalar  components  of  z. 

There  are  two  major  processes  involved  in  the  WRM  algorithm  for  solving  the  equations  in  (121) 
over  a  given  time  interval  [0fT]  ,  namely,  the  assignment- partition  process  and  the  relaxation  process. 
In  the  assignment-partition  process,  each  unknown  variable  is  assigned  to  an  equation  in  which  it  is 
involved.  Then  the  system  of  equations  in  (221a)  is  partitioned  into  m  disjoint  subsystems  of  equa¬ 
tions  of  the  following  form  in  which  the  dependence  on  time  is  not  explicitly  shown  : 


f,(i„x„d|,u) 

f2(x2fX2td2.«) 

=  0 

f  i^(Xmitni^ni,u) 

E(x(0)  —  Xo)  =  0 


(223a) 


(223b) 


where,  for  each  i=l,2  —  ,m  ,  xi€RPl  is  the  subvector  of  unknown  variables  assigned  to  the  Ith  parti¬ 
tioned  subsystem,  fijRP|xRp‘xR2p-2PlXRr-*RPi  is  a  continuous  function,  and 

=  (*1*  *  *  *  >ti— l»ii+ii  *  *  "  »im^  • 

For  the  i*k  subsystem,  x}  is  the  vector  of  endogenous  variables,  while  Xj,  with  j^i,  are  the  vectors 
of  exogenous  variables.  If,  for  each  i  =  1,2  •  •  •  ,m,  the  vector  d*  is  treated  as  an  input  to  the  iu  sub¬ 
system,  then  clearly,  the  solutions  of  the  equations  in  (223a)  can  be  obtained  by  solving  the  m  subsys¬ 
tems  independently.  Therefore,  the  vector  d(  is  called  the  decoupling  vector  for  the  iu  subsystem. 

The  relaxation  process  starts  with  an  initial  guess  of  the  waveforms  for  each  unknown  variable 
and  solves  the  equations  in  (223)  iteratively.  During  each  iteration,  each  subsystem  is  solved  for  its 
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endogenous  variables  for  the  entire  time  interval  [0fT]  by  using  approximated  waveforms  for  its  decou¬ 
pling  vectors.  If  we  use  the  superscript  k  to  denote  vectors  obtained  during  the  k11*  iteration,  then  the 
WRM  algorithm  can  be  described  as  starting  with  an  initial  guess  of  waveforms  x°(t) :  t€[0,T]  such 
that  x°(0)  =  x<,  and  sweeping  through  the  equations  in  (2.23a)  one  by  one  such  that  during  the  k1* 
iteration,  the  waveforms  XjHt) :  t€[0,T]  are  obtained  by  solving 

fi(i1kpcik,dik,u)  =  0  (2.24a) 

Ej(Xjk(0)  -  xi0)  =  0  (2.24b) 


where,  if  Gauss-Seidel  relaxation  is  used,  then  the  decoupling  vectors  are  taken  as 


dk  =  (xk,  *  •  *  ^tiljAi+i1,  *  *  •  ‘  ‘  *•*»*« 


or.  if  Gauss-Jacobi  relaxation  is  used,  then 


d>  =  (xf-1,  •  •  •  .x^TUiV, 


* 


•  k-1  .  k— 1 
"  A— 1  »*i+l  » 


The  iterations  stop  when  the  error  8k  —  |jxk  —  xk-,JJ  becomes  sufficiently  small,  where  the  norm  of 
the  vector  of  waveforms  is  defined  in  Equation  (2.22)  above. 


In  contrast  to  the  conditions  for  convergence  of  point  wise  relaxation  methods  discussed  in  the 
previous  sections,  it  has  been  shown  by  Lelarasmee  [9]  that  the  conditions  for  convergence  of  the 
waveform  relaxation  method  are  fairly  mild.  First,  the  circuit  Equations  (2.21a)  and  (2.21b)  are 
transformed  into  a  canonical  form  so  that  the  error  after  the  klh  iteration  can  be  expressed  as  a  func¬ 
tion  of  the  error  after  the  previous  iteration  in  the  form  of  a  contraction  mapping.  If  the  initial 
waveform  guesses  and  the  inputs  are  all  piecewise  continuous,  and  the  canonical  functions  are  globally 
Lipschitz  continuous  and  contractive,  then  it  is  shown  in  [9]  that  uniform  convergence  is  guaranteed 
for  the  WRM  algorithm  under  the  norm  defined  in  Equation  (2.22).  The  convergence,  however,  is 
linear  as  in  other  relaxation  methods. 

In  spite  of  the  surprisingly  mild  conditions  for  convergence  which  are  easily  satisfied  by  most 
practical  electronic  circuits,  the  WRM  procedure  implemented  in  simulators  such  as  RELAX  [10]  and 


RELAX2  111]  suffers  from  certain  drawbacks.  The  main  drawback  is  that  if  fairly  strong  coupling 
exists  between  the  various  partitioned  subsystems,  as  in  circuits  with  logic  feedback  loops  such  as  finite 
state  machines,  asynchronous  sequential  circuits,  and  ring  oscillators,  the  number  of  iterations  required 
for  convergence  may  be  prohibitively  large  and  also  proportional  to  the  length  of  the  interval  of 
analysis.  Some  of  the  drawbacks  have  been  overcome  in  RELAX2.1  [12]  ,  wherein  the  time  interval 
[O.T]  is  partitioned  into  certain  slots  or  windows  and  the  subsystems  are  analyzed  only  for  the  duration 
of  a  present  window  before  moving  on  to  the  next  window,  and  so  on.  However,  it  has  been  shown  in 
[37]  that,  in  the  case  of  stiff  systems  where  the  coupling  among  the  subsystems  causes  the  stiffness,  the 
sizes  of  the  windows  have  to  be  reduced  considerably  in  order  to  keep  the  iteration  count  during  a  win¬ 
dow  within  a  prescribed  bound.  This  would  then  require  an  extremely  large  number  of  windows  to 
span  the  entire  time  interval  of  analysis. 

2.3  Digital  Simulation 

Digital  simulators  [  13-261  or  logic  simulators  as  they  are  often  called,  form  an  important  class  of 
computerized  tools  for  designing  very  large  integrated  circuits.  These  simulators  provide  a  discrete 
"on/ off"  type  analysis  of  the  circuit  under  test.  Signal  values  are  described  by  a  fairly  small  number  of 
discrete  levels  rather  than  in  a  continuous  range  as  is  the  case  m  an  analog  simulator.  Through  the  use 
of  very  simple  models  for  the  devices  and  Boolean  arithmetic  to  perform  operations  on  the  discrete  sig¬ 
nal  values,  digital  simulators  are  often  capable  of  economically  analyzing  circuits  containing  the 
equivalent  of  over  100k  active  devices.  The  dynamics  of  the  circuit  are,  however,  modeled  by  simply 
delaying  the  various  signal  transitions  between  the  discrete  levels.  In  most  cases  a  simple,  user-defined 
rise  and  fall  delay  between  the  input  and  output  of  a  logic-gate  or  transistor-group  is  used.  Thus  digi¬ 
tal  simulators,  at  best,  provide  a  fairly  crude,  first-order  timing  analysis  of  the  circuit  under  considera- 


Digital  simulators  are  useful  and  popular  since  most  integrated  circuits  are  primarily  digital  in 
nature.  The  usefulness  of  a  digital  simulator,  however,  depends  greatly  on  the  consistency  and  accu¬ 
racy  with  which  it  can  model  the  logic  behavior  of  a  full  range  of  design  techniques  available  to  the 
designers  of  integrated  circuits.  Of  course,  no  digital  simulator  can  model  all  designs  with  complete 
accuracy,  because  it  does  not  simulate  the  detailed  analog  behavior  of  the  circuit.  It  should,  nonetheless, 
provide  as  close  a  model  as  possible  within  a  set  of  well-defined  limitations.  As  a  further  requirement, 
a  digital  simulator  for  VLSI  circuits  must  be  efficient  enough  to  simulate  entire  systems  with  reason¬ 
able  speed.  A  digital  simulator  has,  as  its  basis,  an  abstract  model  of  how  digital  systems  function-  This 
logical  model  describes  both  the  structure  and  the  behavior  of  a  system  in  terms  of  a  set  of  primitive 
elements,  a  set  of  interconnections,  and  a  set  of  rules  for  operation.  For  a  simulator  to  accurately  and 
reliably  simulate  a  system,  the  logical  model  must  reflect  its  actual  structure  and  operation.  Digital 
simulators  can  be  divided  into  two  categories,  namely.  Boolean  gate- level  simulators  [l  3-18]  and 
switch-level  simulators  [19-27]. 

2.3.1  Gate-level  Simulation 

The  Boolean  logic  gate  model  has  formed  the  theoretical  basis  for  logic  design  ever  since  the 
advent  of  electronic  logic.  In  this  model  a  circuit  is  composed  of  several  logic  gates  connected  by  uni¬ 
directional,  memoryless  wires.  The  logic  gates  themselves  are  collections  of  transistors  and/or  other  cir¬ 
cuit  elements  which  perform  a  logic  function.  A  logic  gate  may  be  a  simple  inverter,  NAND  gate,  or 
NOR  gate,  or  a  more  complex  functional  unit  such  as  flip-flops  and  registers.  The  logic  gates  compute 
Boolean  functions  of  their  input  signals  and  transmit  these  values  along  wires  to  the  inputs  of  other 
gates  to  which  it  might  be  connected.  Each  gate  input  has  a  unique  signal  source.  Information  is  stored 
only  in  feedback  paths  of  sequential  circuits.  The  Boolean  gate  model  directly  implements  the  well- 
known  two-valued  Boolean  algebra  and  hence  has  a  well-defined  specification  which  can  guide  the 
simulator  implementation. 
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The  unilateral  nature  of  logic  gates  is  fundamental  to  the  operation  of  gate- level  simulators.  For 
each  binary  vector  at  the  input  nodes  of  a  logic  gate,  the  binary  value  (he,  0,  or  1  )  at  the  output  is 
computed  and  propagated  on  to  the  inputs  of  other  gates  that  might  be  connected  to  it.  Due  to  the  iner¬ 
tial  elements  such  as  node  capacitances  present  in  the  circuit,  however,  a  change  in  the  state  of  the  input 
to  a  gate  would  propagate  to  the  output  only  after  a  certain  time  delay.  Simulators  which  do  not 
account  for  this  delay  can  analyze  only  combinational  circuits.  Thus,  simulators  which  handle  sequen¬ 
tial  circuits  must  estimate  the  propagational  delay  through  a  logic  gate  and  they  do  so  in  several  ways. 
Some  simulators  operate  in  the  so-called  unit  delay  mode,  where  all  logic  gates  are  assumed  to  have  the 
same  delay.  Unit-delay  simulators,  however,  can  verify  only  the  steady-state  behavior  or  the  logic 
functionality  of  the  digital  circuit.  In  order  to  provide  some  kind  of  timing  information,  some  simula¬ 
tors  allow  assignable  delays  where  the  user  can  assign  specific  delays  through  any  of  the  logic  gates 
used  in  the  simulation.  Even  in  assignable  delay  simulators,  the  delay  values  may  only  be  integer  mul¬ 
tiples  of  a  fundamental  time  quantum,  usually  referred  to  as  the  minimum  resolvable  time  (MRT).  For 
example,  the  MRT  in  a  certain  simulator  may  be  0.1  ns,  in  which  case  a  gate  delay  of  10  units 
represents  an  effective  delay  of  1.0  ns. 

The  difference  in  propagational  delays  through  different  signal  paths  in  a  network  of  logic  gates 
may  sometimes  cause  undesirable  situations,  such  as  static  hazards  and  dynamic  hazards.  Hazards 
(28,29,39,40,64)  are  situations  where  it  is  possible  for  spurious  glitches  or  spikes  to  appear  in  an  other¬ 
wise  smooth  analog  waveform  at  the  output  of  a  logic  gate.  In  a  sequential  circuit,  the  occurrence  of  a 
glitch  could  cause  the  circuit  to  malfunction.  Therefore,  the  detection  of  hazards  and  race  conditions 
[23,62,65]  are  very  important,  and  hence,  most  digital  simulators  caution  the  user  when  they  occur. 
The  detection  of  hazards  is  possible  by  introducing  a  third  state,  usually  denoted  by  X,  to  represent  sig¬ 
nal  transitions  [28,29,39,40,64].  In  this  dissertation,  we  do  not  consider  race  conditions  since  we  assume 
that  timing  is  known,  and  hence  any  potential  race  condition  will  be  resolved  according  to  the  timing. 


The  Boolean  gate  model  cannot  represent  many  of  the  design  techniques  currently  used  in  VLSI 
design.  This  is  especially  true  in  the  case  of  MOS  VLSI  circuits.  The  MOS  pass  transistor  is  often  used 
to  implement  combinatorial  logic  in  ways  which  resemble  relay  contact  switches  more  closely  than 
logic  gates.  These  bidirectional  elements  are  difficult  to  handle  using  the  gate  model  and  are  often 
approximated  by  unidirectional  gates.  Dynamic  memory  can  store  information  without  feedback  paths 
by  exploiting  the  capacitances  of  the  wires  and  the  gate  terminals  of  the  transistors  attached  to  them. 
A  variety  of  bus  structures  is  often  used  to  provide  multidirectional,  multipoint  communication. 
Hence,  most  existing  digital  simulators  extend  the  Boolean  gate  model  in  various  ways  to  handle  MOS 
circuits. 

Many  simulators  extend  the  two-valued  logic  of  Boolean  algebra  with  a  third  value  to  represent 
an  unknown  or  undefined  logic  level.  This  X  state  could  indicate  an  uninitialized  signal,  a  signal  held 
between  two  logic  thresholds,  or  a  signal  in  a  0-»l  or  l-*0  transition.  The  X  state  is  handled  algebrai¬ 
cally  by  extending  the  binary  Boolean  algebra  to  a  ternary  or  three-valued  DeMorgan’s  algebra  [18,39]. 
Thus,  even  with  this  extension,  many  of  the  desirable  mathematical  properties  of  the  Boolean  gate 
model  are  preserved.  The  X  state  implemented  this  way  is  also  useful  in  the  detection  of  hazards  and 
race  conditions  [ 2 3,2 8, 39.40,62,64].  Alternatively,  some  simulators  implement  the  X  state  by  an 
enumeration  technique  in  which  the  simulation  is  repeated  with  the  nodes  in  the  X  state  set  to  all  pos¬ 
sible  combinations  of  0’s  and  I’s  [41 J.  \odes  that  remain  in  a  unique  binary  state  for  all  combinations 
are  set  to  this  state,  while  all  others  are  set  to  X.  To  simulate  tristate  gates  and  logic  busses,  some  simu¬ 
lators  use  a  fourth  state,  called  the  high  impedance  state,  and  often  denoted  by  H  [16].  This  H  state  is 
also  used  sometimes  to  model  dynamic  memory  by  allowing  a  node  to  retain  its  previous  logic  state  if 
the  outputs  of  all  logic  gates  connected  to  the  node  are  at  the  H  level. 

.As  far  as  simulation  is  concerned,  most  gate-level  simulators  belong  to  one  of  two  general  types. 
The  first  is  based  on  the  Huffman  logic  model  [42],  as  shown  in  Figure  2.1.  In  this  model,  all  the  feed¬ 
back  paths  in  the  network  are  initially  broken  resulting  in  a  purely  combinatorial  network,  which  is 
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Figure  2.1  :  The  Huffman  logic  model  for  logic  analysis 
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then  levelized  in  terms  of  signal  dependence.  The  feedback  is  restored  by  inserting  delay  elements 
between  the  secondary  outputs  and  secondary  inputs  of  the  combinatorial  pan  of  the  network.  The 
analysis  begins  by  applying  the  input  excitations  and  following  paths  where  the  signal  states  change 
through  the  network  to  the  outputs.  The  delays  are  applied  to  any  secondary  output  change  and  the 
analysis  of  the  combinatorial  block  begins  once  again.  The  process  is  repeated  until  the  requested  input 
sequence  has  been  completed.  This  approach  is  used  in  SALOGS  [16],  and  is  quite  efficient  for  circuits 
where  relatively  few  delays  are  significant  or,  in  other  words,  for  nearly  combinational  circuits. 

The  second  and  more  common  approach  is  based  on  the  use  of  a  time  queue  (TQ)  [43]  as  shown  in 
Figure  2.2.  Each  entry  in  the  queue  represents  a  discrete  point  in  simulation  time.  Time  moves  ahead 
in  fixed  increments  which  correspond  to  consecutive  entries  in  the  TQ.  Each  entry  in  the  queue  con¬ 
tains  a  pointer  to  a  list  of  events  which  are  to  occur  at  that  instant  of  time.  An  event  is  usually 
defined  as  a  change  in  the  logical  state  of  an  output  node  of  an  element.  The  element,  in  this  case,  may 
be  a  voltage  source  or  a  logic  gate.  The  new  state  may  or  may  not  be  the  same  as  the  state  already  held 
by  the  output  line.  If  the  new  state  is  different  from  the  old  one,  then  all  elements  whose  input  lines 
are  connected  to  this  output  line,  called  fanout  elements,  must  be  processed  to  see  if  this  change  affects 
their  outputs.  If  an  element  gets  processed  at  say,  time  t;,  and  the  input  event  is  found  to  cause  an  out¬ 
put  event,  then  the  output  event  is  assumed  to  occur  at  time  ti+k  where  k>0  represents  a  positive 
delay  through  the  logic  gate.  The  fanouts  of  the  output  node  then  get  scheduled  for  processing  at  time 
ti+k.  If  the  state  of  an  output  node  remains  unchanged,  then  the  fanouts  are  not  added  to  the  time 
queue.  This  approach  is  often  referred  to  as  a  selective  trace  technique,  or  an  event-driven  scheme,  or 
sometimes  even  as  dynamic  leveling.  In  the  case  of  logic  simulation,  no  penalty  in  accuracy  or  stability 
of  analysis  is  incurred  with  the  use  of  the  selective  trace  method.  One  of  the  advantages  of  this  scheme 
is  that  it  allows  different  gates  to  have  different  delays  and,  moreover,  the  delay  value  through  a  gate 
is  also  allowed  to  change  as  the  simulation  proceeds.  This  is  especially  good  for  MOS  logic  gates  which 
have  different  delays  for  rising  transitions  and  falling  transitions  at  the  output  respectively.  Further¬ 
more,  the  presence  of  feedback  among  the  logic  gates  does  not  complicate  the  simulation,  since  the  delay 
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through  a  feedback  loop  would  schedule  a  gate  for  processing  only  in  the  future  and  never  at  the 
present  time  in  the  queue.  Several  logic  simulators  such  as  TEGAS  [17],  and  SPLICE  [13],  use  the  TQ 
approach  successfully. 

Gate-level  simulators,  however,  are  not  entirely  suitable  for  the  digital  or  logic  simulation  of 
MOS  circuits.  This  is  due  to  the  fundamental  mismatch  between  the  Boolean  gate  model  and  the 
behavior  of  MOS  logic  circuits.  MOS  circuits  consist  of  bidirectional  switching  elements  connected  by 
bidirectional  wires  with  memory  (considering  the  capacitance  of  the  interconnect  and  of  the  transistor 
gates  as  contributing  to  the  wire’s  memory).  Hence,  the  need  for  a  different  approach  to  the  digital 
modeling  and  simulation  of  MOS  circuits  is  apparent  and  is  discussed  in  the  following  section. 

23.2  Switch-level  Simulation 

A  new  class  of  digital  simulators  known  as  switch-level  simulators  has  emerged  fairly  recently  as 
an  alternative  to  the  more  conventional  gate-level  simulators,  specifically  for  the  simulation  of  MOS 
VLSI  circuits.  The  Boolean  gate  concept  is  discarded  altogether  in  these  simulators,  and  is  replaced  by  a 
bidirectional  switch  model  which  closely  matches  the  structure  and  behavior  of  MOS  circuits. 

One  of  the  first  switch-level  simulators  to  be  implemented  is  MOSSIM  [19],  in  which  an  MOS 
logic  network  is  modeled  as  a  set  of  nodes  interconnected  by  a  set  of  transistor  switches.  MOSSIM  uses 
three  logic  levels,  0,  1,  and  X,  to  describe  signal  values  at  the  various  nodes.  The  level  X  is  the  unde¬ 
fined  or  sometimes  unknown  level  used  to  represent  a  signal  level  that  cannot  be  uniquely  determined 
due  to  an  ambiguity  in  the  network  condition.  Each  node  is  also  assigned  a  strength  which  indicates 
the  extent  to  which  the  node  can  force  its  value  on  other  nodes  connected  to  it  via  a  path  of  conducting 
switches.  Input  nodes  are  the  strongest  and  provide  externally  generated  signals  such  as,  power  lines, 
ground,  clock  drivers,  and  data  inputs.  A  node  connected  to  a  voltage  source  through  a  pullup  resistor 
is  called  a  pullup  node.  The  pullup  resistor  is  normally  realized  using  a  depletion  transistor  with  gate 
and  source  terminals  shorted.  A  pullup  node  is  at  level  1  unless  there  is  a  path  of  conducting 


transistors  to  an  input  node,  in  which  case  the  pullup  node  takes  on  the  value  of  the  stronger  node.  The 
rest  of  the  nodes  in  the  circuit  are  normal  nodes.  These  nodes  are  the  weakest  and  are  capable  of  only 
storing  charge  dynamically.  Thus  we  have  three  types  of  nodes  with  strengths  ordered  as 
input  >  pullup  >  normal. 

An  MOS  transistor  is  modeled  as  a  three  node  device  which  acts  as  a  bidirectional  switch  between 
its  drain  and  source  nodes  with  the  signal  at  the  gate  node  controlling  the  state  of  the  switch.  There 
are  two  types  of  transistors  allowed  in  MOSSIM  :  n-type  and  p-type.  When  the  gate  signal  is  a  0,  the 
n-type  (p-type)  switch  is  open  (closed),  and  when  the  gate  signal  is  a  1,  the  n-type  (p-type)  switch  is 
closed  (open).  The  status  of  either  switch  is  unknown,  i.e^  it  may  be  open,  closed,  or  somewhere 
between,  when  the  gate  signal  is  in  the  X  state.  When  a  switch  is  closed,  it  is  treated  as  a  bidirectional 
switch,  and  no  distinction  is  made  between  drain  and  source  nodes  of  the  device. 

The  network  can  be  described  to  MOSSIM  in  terms  of  transistors,  logic  gates,  and  user-defined 
macros,  but  these  are  all  translated  into  a  transistor  level  representation  for  simulation.  The  program 
begins  by  splitting  each  input  node  (including  the  ground  node)  into  a  number  of  physical  input  nodes, 
one  for  each  transistor  to  which  it  is  connected.  This  is  possible  since  the  input  nodes  provide  strong  sig¬ 
nals  to  the  network  which  cannot  be  modified  by  the  internal  operations  of  the  network.  The  gate 
node  of  a  transistor  is  treated  as  a  pure  input  to  the  switch  and  its  state  determines  the  conduction  state 
of  the  switch.  This  helps  partition  the  set  of  transistors  into  groups,  which  can  be  defined  as  follows : 
Consider  an  undirected  graph  with  a  vertex  for  every  node  in  the  circuit  and  an  edge  between  drain 
and  source  nodes  for  each  transistor.  The  graph  will  then  have  several  connected  components.  The  set 
of  nodes  and  transistors  corresponding  to  a  component  forms  a  group.  Thus  all  bilateral  interactions 
between  nodes  take  place  within  a  group. 

A  clock  in  MOSSIM  is  defined  as  a  set  of  binary  sequences  to  be  applied  cyclically  to  a  set  of 
input  nodes.  A  phase  is  one  set  of  clock  and  input  data  values.  The  basic  quantum  of  time  is  a  unit 
step.  Within  a  phase,  the  circuit  is  assumed  to  settle  down  after  a  certain  number  of  unit  steps.  The 
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simulation  begins  by  first  initialling  all  nodes  to  the  X  state.  At  the  beginning  of  each  phase  all  input 
nodes  are  assigned  their  new  values  and  the  groups  whose  input  nodes  are  changed  are  placed  in  an 
event  list  that  has  initialized  previously.  Then  a  series  of  unit  steps  are  taken  until  the  event  list  is 
emptied,  indicating  that  the  network  has  settled.  Once  the  network  settles,  the  simulation  of  the  next 
phase  can  begin.  During  a  unit  step  simulation,  the  states  of  the  transistors  within  a  group  are  held 
fixed  and  the  values  of  the  pullup  and  normal  nodes  are  updated.  This  is  done  for  each  group  in  the 
event  list.  Each  updating  results  in  a  certain  number  (possibly  zero)  of  nodes  changing  states  which  are 
accumulated  in  a  set  of  active  nodes.  After  all  the  groups  in  the  event  list  are  simulated,  the  transistors 
whose  gate  nodes  are  active  are  updated  and  the  groups  in  which  these  transistors  lie  are  added  to  a  new 
event  list  for  use  in  the  next  unit  step.  By  first  changing  the  node  states  while  holding  the  transistor 
states  fixed  and  then  changing  the  transistor  states  with  the  nodes  fixed,  the  transistors,  in  effect, 
switch  one  unit  of  time  after  their  gate  nodes  change.  Thus  if  the  transistor  groups  are  treated  as  con¬ 
ventional  logic  gates,  the  simulation  appears  very  much  like  an  event-driven,  unit-delay  gate-level 
simulation.  The  procedure  for  updating  the  node  states  within  a  group,  however,  is  very  different. 

We  now  describe  the  algorithms  in  MOSSIM  used  to  update  the  states  of  the  drain  and  source 
nodes  of  transistors  within  a  group  based  on  the  concept  of  node  strengths.  Initially,  all  pullup  nodes 
are  set  to  logical  1.  Next,  an  undirected  graph  is  constructed  with  a  vertex  corresponding  to  each  node 
in  the  group  and  an  edge  between  the  dram  and  source  nodes  of  each  transistor  in  the  closed  state.  The 
connected  components  of  this  graph  partition  the  set  of  nodes  into  equivalence  classes.  Within  each 
class,  the  strongest  nodes  are  determined  based  on  the  ordering  input  >  pullup  >  normal.  The  strength 
of  the  class  is  then  the  strength  of  the  strongest  nodes.  If  the  states  of  the  strongest  nodes  are  equal, 
then  class  state  is  set  to  this  state;  otherwise  the  class  state  is  set  to  X.  If  the  class  strength  is  pullup,  the 
class  state  is  always  a  1. 

If  the  group  contains  x-transistors,  which  are  transistors  having  an  X  state  on  their  gate  nodes, 
then  the  unknown  switching  behavior  of  these  transistors  could  alter  the  class  states.  To  deal  with 
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them  consistently,  MOSSIM  adopts  the  following  philosophy  :  if  a  node  has  a  unique  state  regardless  of 
the  conduction  state  of  the  x-transistors,  then  the  node  will  be  set  to  this  state;  otherwise  it  will  be  set 
to  the  X  state.  Thus  the  state  of  each  class  computed  as  described  above  is  based  on  the  assumption  that 
all  x-transistors  are  open. 

The  second  part  of  simulating  the  group  begins  by  forming  a  supergraph  containing  a  vertex  for 
each  class  and  an  edge  between  two  vertices  if  an  x-transistor  connects  two  network  nodes  in  the  two 
corresponding  classes.  The  connected  components  of  the  supergraph  partition  the  classes  into  a  set  of 
superclasses,  in  which  each  superclass  is  a  set  of  classes  linked  by  x-transistors.  If  a  superclass  contains 
only  one  class  then  no  further  analysis  is  needed.  Otherwise,  the  strength  of  a  superclass  is  computed  as 
the  strength  of  its  strongest  classes.  The  state  of  a  superclass  is  set  to  the  state  of  the  strongest  classes  if 
they  are  all  equal,  and  X  if  they  are  not.  A  class  is  said  to  be  poisoned  if  its  state  is  different  from  the 
superclass  state.  Furthermore,  a  poisoned  class  could  poison  a  neighboring  class  which  is  not  stronger 
than  itself  even  if  the  state  of  the  neighbor  is  the  same  as  the  superclass  state.  Thus  poisoning  can 
spread  through  classes  and  be  stopped  only  by  classes  with  greater  strength  than  the  original  poisoned 
class.  The  state  of  each  poisoned  class  is  then  reset  to  X.  Once  the  states  of  all  the  classes  have  been 
computed,  the  state  of  each  node  in  a  class  is  set  to  its  class  state. 

Several  modifications  and  extensions  of  the  basic  MOSSIM  philosophy  have  been  considered  by  a 
number  of  authors  [20-25,63,65].  In  [20],  Bryant  provides  an  abstract  model  for  the  switch-level  simu¬ 
lation  of  MOS  logic  networks  which  is  more  general  and  formal  than  the  one  in  MOSSIM.  Unlike 
MOSSIM,  only  two  types  of  nodes,  namely,  input  nodes  and  normal  nodes  are  allowed.  A  third  type  of 
transistor,  called  d-type  (for  depletion),  is  introduced  which  is  closed  regardless  of  its  gate  signal.  To 
model  raiioed  logic,  transistors  may  have  different  strengths  (or  conductances)  when  in  the  closed  state. 
Thus,  a  stronger  transistor  (such  as  an  inverter  pulldown)  is  able  to  override  a  weaker  one  (such  as  a 
pullup  load  transistor).  In  MOSSIM,  each  normal  node  is  modeled  as  having  a  capacitance  of  unknown 
value  which  can  store  charge  but  cannot  drive  its  signal  onto  another  node  in  a  different  state.  Unfor- 
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tunately,  this  model  cannot  describe  the  behavior  of  many  bus  designs  in  which  a  relatively  high  capa¬ 
citance  bus  node  is  connected  to  a  node  of  lower  capacitance  (such  as  the  storage  node  in  a  three- 
transistor  dynamic  RAM  cell)  resulting  in  both  nodes  having  the  same  logic  state  that  was  originally 
on  the  bus.  In  the  new  model  each  normal  node  is  assigned  a  size,  which  is  indicative  of  the  value  of 
the  node-capacitance. 

The  time  and  the  electrical  behavior  of  the  logic  network  are  described  in  a  formal  way  in  [20] 
by  introducing  the  notion  of  a  target  function.  Given  a  particular  set  of  input  node,  transistor,  and 
initial  normal  node  states,  the  target  function  provides  the  final  states  of  the  normal  nodes.  For  circuits 
f ree  of  critical  races,  the  logical  behavior  of  the  network  can  be  modeled  by  repeated  application  of  the 
target  function.  The  passage  of  time  is  modeled  just  as  in  MOSSIM,  Le.,  every  application  of  the  target 
function  is  like  advancing  a  unit  step  in  time.  The  electrical  behavior  of  the  network  is  modeled  by 
defining  the  target  state  function  in  terms  of  a  set  of  steady-state  voltages  in  an  or der-of -magnitude 
electrical  network.  This  class  of  networks  models  the  conducting  transistors  by  linear  resistors,  where 
the  resistances  (or  conductances)  of  different  strength  transistors  differ  by  orders  of  magnitude.  As  a 
result,  any  path  to  an  input  node  containing  only  transistors  of  large  strength  is  modeled  as  overriding 
any  path  containing  a  transistor  with  lesser  strength.  Similarly,  the  normal  nodes  are  modeled  by  capa¬ 
citors  where  the  capacitances  for  different  size  nodes  differ  by  orders  of  magnitude.  Thus,  the  target 
states  formed  on  a  set  of  nodes  through  charge  sharing  depend  only  on  the  state  of  the  largest  size 
node(s)  in  the  set.  Furthermore,  no  attempt  is  made  to  accurately  compute  the  node  voltages.  Instead, 
they  are  classified  into  three  logic  levels,  0,  X,  and  1.  Although  the  target  state  is  defined  in  terms  of 
an  electrical  model,  it  can  be  computed  logically,  without  evaluating  any  electrical  network.  By  intro¬ 
ducing  an  abstraction  called  logic  signals,  an  iterative  method  which  uses  only  operations  on  a  simple, 
discrete  algebra  is  used  for  computing  the  target  state  function.  A  logic  signal  provides  a  composite 
description  of  a  switch-level  network  at  some  node  for  a  particular  set  of  node  and  transistor  states, 
much  in  the  same  way  as  a  Thevenin  equivalent  network  for  an  electrical  network.  Finding  the  target 
state  then  reduces  to  finding  a  minimum  solution  of  a  set  of  equations  involving  logic  signals. 


In  [211  Byrd  et  aL  have  independently  developed  a  consistent,  complete,  circuit  theoretic  based 
interpretation  of  switch-level  simulation  and  modeling.  They  formally  relate  the  true  behavior  of  real 
conductance  networks  and  the  switch-level  model.  As  in  Bryant’s  model  [20l  transistor  switches  are 
modeled  as  linear  conductors  whose  conductances  belong  to  an  arbitrarily  deep  hierarchy  of  conduc¬ 
tance  classes,  G^G2,  •  •  •  ,GP,  where  any  g'CG*  and  gj€GJ  satisfies  g1»gj,  if  i>  j.  Some  drawbacks  of 
Bryant’s  solution  of  the  conductance  network  using  a  minimum  principle  with  a  discrete  algebra  are 
pointed  out  and  a  more  general  circuit  theoretic  based  procedure  which  expresses  a  signal  at  a  normal 
node  as  a  convex  combination  of  the  input  signals  is  presented.  PARCHEMIN  is  a  switch-level  simula¬ 
tor  using  these  algorithms. 

In  [22],  the  notion  of  a  well-designed  circuit  is  introduced  and  an  improved  switch-level  simula¬ 
tor  that  runs  extremely  fast  on  such  circuits  is  presented.  This  simulator  also  detects  race  conditions 
and  handles  the  X  state  in  a  clean  and  efficient  manner.  A  linear-time  algorithm  that  detects  race  con¬ 
ditions  in  any  nonoscillating  circuit  (i-e„  a  circuit  that  is  acyclic  within  a  clock  phase)  has  been 
developed  by  Ramachandran  [23].  In  certain  cases  this  algorithm  is  overly  cautious  and  might  indicate 
a  presence  of  a  race  condition,  when  in  reality,  the  circuit  has  no  race  condition.  In  [65],  the  authors 
introduce  a  new  model,  known  as  the  NC-model,  for  switch-level  simulation,  and  show  that  the  simu¬ 
lation  of  any  circuit  (including  oscillating  circuits)  can  be  performed  in  quadratic  time  under  this  new 
model. 

An  alternative  approach  to  switch-level  simulation  is  based  on  generation  and  evaluation  of  sym¬ 
bolic  logic  expressions  [24].  A  special  discrete  algebra  is  used,  and  logic  expressions  for  a  node  are  gen¬ 
erated  hierarchically,  where  each  level  of  hierarchy  represents  the  influence  of  node  signals  of  a  partic¬ 
ular  strength  on  that  node.  In  evaluating  the  logic  expressions,  the  undefined  X  state  does  not  present 
any  special  problem  due  to  the  versatility  of  the  new  algebra.  Furthermore,  simulating  the  basic  faults 
in  MOS  circuits  is  easily  incorporated,  thereby  making  this  a  fairly  attractive  scheme.  These  ideas  are 
used  in  EXPRESS-II  [25],  a  fast  and  efficient  switch-level  fault  simulator  for  MOS  designs. 
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2.4  Mixed-mode  or  Hybrid  Simulation 

An  ideal  simulator  for  VLSI  circuits  would  be  one  which  has  the  speed  and  efficiency  of  digital 
or  logic  simulators  while  providing  the  accuracy  and  detail  of  an  analog  simulator.  An  attempt  to 
achieve  this  is  through  mixed-mode  or  hybrid  simulation.  In  many  of  the  VLSI  circuits  the  detail  and 
accuracy  provided  by  the  analog  simulators  are  not  required  for  the  entire  circuit  under  investigation, 
but  only  for  some  critical  areas  of  the  circuit.  This  is  particularly  true  of  large  digital  circuits,  where 
often  a  simple  digital  simulation  (gate-level  or  switch-level)  provides  sufficient  information  about  the 
performance  of  much  of  the  circuit,  while  some  parts,  such  as  sense  amplifiers  in  memory  circuits  or 
tightly  coupled  analog  blocks,  might  require  more  detailed  modeling  and  analysis. 

By  providing  a  range  of  models,  from  highly  accurate  and  complex  analog  device  models  to  much 
less  accurate  but  greatly  simplified  gate-level  or  switch-level  models,  the  circuit  designer  can  reduce  the 
simulation  time  significantly  by  choosing  the  computationally  less  expensive  models  whenever  it  is 
appropriate  and  possible.  Another  property  of  large  circuits  which  may  be  exploited  is  their  relative 
inactivity  or  latency.  In  a  typical  VLSI  circuit,  usually  only  less  than  20%  of  the  signals  change  values 
significantly  at  any  one  time  instant. 

Hybrid  analysis  programs  allow  the  designer  to  use  a  combination  of  analysis  techniques  and 
models,  ranging  from  circuit  and  timing  simulation  to  much  cheaper  digital  simulation,  in  the  same 
program.  These  simulators,  such  as  SPLICE  [131  DIANA  [14],  and  SAMSON  [15],  have  been  observed  to 
realize  a  one  or  two  order  of  magnitude  reduction  in  simulation  time  and  substantially  lower  memory 
than  standard  circuit  simulators,  while  still  providing  a  detailed  circuit-level  analysis  where  necessary. 

Mixed-mode  or  hybrid  simulators,  however,  work  well  as  long  as  only  small,  isolated  sections  of 
the  circuit  need  to  be  simulated  as  analog  circuits.  Unfortunately,  the  partitioning  of  the  circuit  into 
sections  which  require  analog  simulation  and  those  which  do  not  is  not  fully  automatic;  some  amount 
of  human  intervention  is  still  required.  Furthermore,  trying  to  combine  analog  and  digital  models  in  a 
single  program  requires  rather  unsatisfactory  approximations  at  the  interfaces.  For  example,  if  the  out- 
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put  of  a  section  of  logic  gates  is  to  be  interfaced  to  an  input  of  a  section  modeled  as  an  analog  circuit,  a 
logic-to- voltage  waveform  conversion  is  required.  This,  of  course,  cannot  be  done  with  any  accuracy, 
since  much  of  the  necessary  information  is  lacking.  The  resultant  outputs  of  the  analog  section  must 
then  be  viewed  somewhat  skeptically.  Similarly,  certain  states  used  in  logic  simulators,  such  as  the 
unknown  state  X,  or  the  high-impedance  state  H,  do  not  represent  a  single  voltage  and  therefore  cannot 
be  interfaced  with  an  analog  simulator.  Therefore,  unless  great  care  is  exercised,  a  hybrid  simulator 
could  end  up  providing  the  accuracy  of  a  logic  simulator  at  the  speed  of  an  analog  simulator,  rather 
than  vice  versa. 


2.5  Switch-level  Timing  Simulation 

The  problem  of  switch-level  timing  simulation  of  a  digital  circuit  can  be  defined  as  follows : 

Consider  the  analog  waveform  Vn(t),  tCtt^tf]  at  a  certain  node  n  in  a  digital  circuit  and  choose  p— 1 
threshold  values,  ordered  as  v,  <v2  <  •  •  ♦  <  vp_,.  Define  the  p-state  digital  equivalent  of  Vn  to  be 

X„(t)  =  Xj  if  V;  <  Vn(t)  ^vi+,  (2.25a) 

where  x*, ,  x,  ,  •  •  •  ,  xp_,  are  the  p  digital  states  and  v0  and  vp  are  the  minimum  and  maximum 
values  of  the  analog  waveforms  respectively.  We  also  define 

Tn  =  itk  :  Vn(tk)€  { vj,v2,  •  •  •  ,vp_i)  }.  (2^5b) 

Thus  T„  is  the  set  of  threshold  crossing  times  of  the  analog  waveform  at  node  n  in  the  circuit,  or  alter¬ 
natively,  the  set  of  state  transition  times  of  its  p-state  digital  equivalent.  The  aim  of  a  switch-level 
timing  simulator  is  to  obtain  the  p-state  digital  equivalents  Xn  for  each  n€II,  with  special  emphasis  on 

computing  (or  estimating)  the  elements  of  T  =  (J  T„,  where  n  denotes  the  set  of  nodes  of  interest  to 
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the  user.  For  brevity  in  notation,  we  shall  use  SLT  to  stand  for  switch-le\'el  timing,  and  so  the  elements 
of  the  set  T  of  threshold  crossing  times  will  be  referred  to  as  SLT  estimates. 


Since  most  VLSI  circuits  are  primarily  digital  in  nature,  the  circuit  designer  is  very  often  satisfied 
in  performing  an  SLT  simulation  in  the  design-verification  process  since  this  enables  him  to  estimate 
the  propagation  delays,  speeds  of  computation,  optimal  clocking  rates,  etc.  The  usefulness  of  an  SLT 
simulator  can  be  measured  by  considering  two  factors,  namely,  the  simulation  cost  which  is  primarily 
an  increasing  function  of  the  CPU  time  and  memory  used  and,  secondly,  the  accuracy  of  the  SLT  esti¬ 
mates.  There  are  two  major  approaches  that  could  be  used  to  perform  an  SLT  simulation  on  a  large 
digital  circuit : 

(1)  Use  an  analog  simulator  and  convert  the  resulting  analog  waveform  into  their  p-state  digital 
equivalents  directly,  by  choosing  an  appropriate  set  of  p-1  threshold  voltages. 

(2)  Use  a  digital  simulator  with  delay  estimation  that  computes  the  p-state  digital  waveform  at  each 
circuit  node  and  generates  the  SLT  estimates. 

Since  it  is  impossible  to  obtain  the  exact  waveforms  analytically,  in  a  typical  VLSI  circuit,  the 
SLT  estimates  produced  by  standard  circuit  simulators  are  considered  accurate  enough  and  are  often 
taken  as  references  to  compare  the  accuracies  of  the  SLT  estimates  produced  with  other  simulators. 
Simulators  using  the  first  approach  include  the  so-called  timing  simulators  such  as  MOTTS  [5],  MOTTS-C 
[6],  and  PREMOS  [8],  which  are  analog  simulators  using  relaxation  techniques  to  speed  up  the  simula¬ 
tion  process  as  described  in  Section  2.2.2.2  . 


In  spite  of  the  several  attempts  made  to  speed  up  the  performance  of  standard  circuit  simulators 
as  discussed  in  Section  2.2,  analog  simulators  are  still  very  expensive  to  use  to  analyze  circuits  with 
more  than  10k  devices.  Digital  simulators,  on  the  other  hand,  have  a  distinct  advantage  in  speed  over 
analog  simulators.  Several  large  circuits  with  over  100k  transistors  have  been  successfully  bandied  by 
these  simulators.  However,  they  provide  rather  inaccurate  SLT  information  due  to  the  poor  modeling 
of  the  dynamics  of  the  circuits.  Most  digital  simulators  produce  two-state  digital  waveforms  and 
account  for  the  circuit  dynamics  by  delaying  the  transition  between  states.  In  all  cases  the  delays  are 
taken  to  be  single-threshold  delays.  Furthermore,  these  simulators  do  not  take  into  account  the  depen- 


dence  of  propagation  delays  on  circuit  parameters,  such  as  load  capacitance,  strengths  of  devices,  input 
slew-rates,  and  other  factors. 

Based  on  the  above  facts,  one  can  conclude  that  the  circuit  designer  who  wishes  to  use  one  of  the 
existing  analog  or  digital  simulation  tools  to  generate  SLT  estimates  in  VLSI  circuits  is  placed  in  a  diffi¬ 
cult  situation.  Analog  simulators  provide  fairly  accurate  SLT  estimates  at  prohibitive  simulation  costs, 
while  digital  simulators  can  handle  entire  VLSI  circuits  but  provide  very  poor  SLT  estimates,  or  some¬ 
times,  none  at  all. 

It  is  therefore  clearly  necessary  to  provide  the  circuit  designer  with  a  simulation  tool  capable  of 
providing  accurate  SLT  estimates  for  VLSI  circuits  at  reasonable  simulation  costs,  thus  having  the  best 
features  of  both  analog  and  digital  simulators.  To  this  end,  one  is  more  likely  to  succeed  in  trying  to 
incorporate  better  timing  models  in  digital  simulators  since  efforts  to  speed  up  analog  simulators  seem 
to  be  approaching  a  limit  which  is  far  below  the  speeds  of  the  digital  ones.  Restricting  oneself  to  the 
MOS  technology  seems  to  make  the  problem  a  little  easier.  An  attempt  has  been  made  recently  to 
model  the  MOS  transistor  as  a  linear  resistor  resulting  in  an  RC-delay  model  for  the  circuit  dynamics 
which  is  used  in  RSIM  [26].  This  is  a  logic-level  timing  simulator  which  predicts  the  logic  state  of  a 
node  and  uses  an  RC  time  constant  to  estimate  the  transition  times  if  the  node  changes  state.  The 
transistor  model  in  RSIM  is  a  gate-voltage  dependent  resistance  Rds  between  drain  and  source  terminals. 
When  the  switch  is  closed,  we  have  R,j$  =  Reff,  when  open  R^  =  oo,  and  when  in  the  unknown  state 
(which  means  vg„e  =  X  )  the  drain-source  connection  is  described  by  a  resistance  interval,  Le., 
Rd,  =  [R,ff,«®l  The  effective  resistance  Reff  is  determined  separately  for  each  transistor  as  a  function 
of  the  device  width  and  length,  the  transistor  type,  and  other  device  parameters.  The  determination  of 
the  effective  resistance  is  made  once  for  each  transistor  and  is  about  the  only  device  information  used 
by  RSIM.  Voltages  in  the  RSIM  model  are  quantized  into  one  of  three  values,  0,  1,  or  X,  and  decided 
by  choosing  two  threshold  voltages,  vlow  and  vhigh. 


The  effect  of  the  resistive  network  on  a  particular  node  is  modeled  by  a  Thevenin  equivalent  cir¬ 
cuit.  The  values  of  V^v  and  Rthev  are  computed,  in  some  cases  approximately,  based  on  a  series- 
parallel-type  approach  which  is  illustrated  in  [27].  The  value  of  Vthev  (which  may  be  a  voltage  inter¬ 
val  in  some  cases)  decides  the  new  state  of  the  node.  If  the  new  value  at  a  node  is  different  from  the 
previous  one,  then  a  transition  is  scheduled  RtievQoad  tune  units  later,  where  Cloa< ,  is  the  net  capaci¬ 
tance  at  the  node.  Actually,  RSIM  uses  three  values  of  the  effective  resistance  for  a  transistor,  namely, 
a  static  value  used  to  determine  Vy,^,  and  two  others  to  be  used  in  determining  rise  and  fall  delays. 
All  these  values  are  determined  in  a  presimulation  phase  using  an  accurate  circuit  simulator  such  as 
SPICE2  [lj,  Charge  sharing  effects  are  also  taken  into  account.  A  nice  feature  of  this  type  of  simula¬ 
tion  is  that  the  X  state  does  not  impose  any  particular  difficulty  as  far  as  the  simulation  is  concerned. 
The  simulator  is  event-driven  and  is  fast  enough  to  simulate  circuits  of  up  to  50k  transistors.  The  SLT 
estimates  are,  however,  computed  only  by  single  threshold  RC  delays  and  are  sometimes  found  to  be 
even  more  than  30%  off  when  compared  with  those  of  SPICE2,  especially  in  the  case  of  MOS  circuits 
with  large  pass-transistor  chains. 

This  dissertation  deals  primarily  with  the  development  of  a  switch-level  timing  simulator  with 
an  empirically  observed  accuracy  of  the  SLT  estimates  generated  to  be  within  10%  of  those  of  SPICE2. 
The  high  accuracy  of  the  SLT  estimates  without  the  use  of  an  analog  simulator  can  be  attributed  to  the 
use  of  a  delay-operator  which  will  be  discussed  in  detail  in  Chapter  5.  This  operator  uses  a  notion  of 
two-threshold  delays,  and  is  thus  able  to  account  for,  among  several  other  factors,  the  effect  of  the  slope 
of  the  analog  input  waveforms  on  the  timing  at  the  output  of  a  logic  gate  or  a  functional  block. 


CHAPTER  3 


NETWORK  PARTITIONING  AND  ORDERING 


In  this  chapter  an  MOS  network  model  that  is  used  to  provide  accurate  switch-level  tuning  (SLT) 
estimates  will  be  presented.  The  network  is  then  partitioned  into  several  subnetworks,  or  blocks.  The 
set  of  blocks  is  further  partitioned  into  its  strongly  connected  components  (SCC).  The  SCCs  in  the  net¬ 
work  are  then  ordered  for  simulation.  Throughout  this  dissertation,  the  algorithms  will  be  outlined 
and  discussed  for  n-channel  MOS  (NMOS)  circuits  with  depletion  loads  only.  Several  extensions  to 
handle  circuits  with  other  technologies,  such  as  complementary  MOS  (CMOS),  will,  however,  be  men¬ 
tioned  briefly  in  Chapter  8. 

3.1  NMOS  Network  Model 

.An  NMOS  digital  network  Q  consists  of  a  set  of  nodes  N  interconnected  by  a  set  of  n-channel 
MOS  transistors  M  The  network  description  can  be  extracted  directly  from  the  layout  using  circuit 
extractors  [49,61],  or  has  to  be  given  by  the  user.  In  any  case,  the  network  description  is  assumed  to 
contain  a  netlist  of  all  the  NMOS  transistors  along  with  several  geometrical  and  process  parameters 
such  as  length  (L)  and  width  (W)  of  each  device,  zero-bias  device  threshold  voltage  (VTO),  transcon¬ 
ductance  parameter  (KP),  the  analog  waveform  at  the  input  sources  and  a  fixed  lumped  capacitance 
from  each  node  to  ground.  Specifying  a  grounded  capacitance  from  each  node  might  seem  to  be  a  res¬ 
triction,  but  most  circuit  extractors  could  be  asked  to  compute  equivalent  device  capacitances  along 
with  the  capacitance  due  to  the  interconnect  regions.  In  this  chapter,  the  only  device  parameter  used 
will  be  VTO.  This  parameter  will  separate  the  set  of  transistors  into  enhancement  and  depletion  types. 
The  rest  of  the  parameters  will  be  used  in  Chapters  4  and  5  to  generate  accurate  SLT  estimates. 


There  are  three  types  of  nodes  :  input  nodes,  pulLup  nodes,  and  normal  nodes.  Input  nodes,  which 
are  modeled  as  voltage  sources,  provide  the  strongest  signals  to  the  network  from  the  outside.  Examples 
of  input  nodes  include  the  power  supply  ( V DD),  the  ground  node,  as  well  as  all  the  input  clock  signals. 
Pullup  nodes  are  attached  to  the  power  supply  VDD  via  a  pullup  resistor.  These  include  the  output 
nodes  of  NMOS  inverters,  NAND  gates,  NOR  gates,  etc.  A  pullup  node  retains  the  value  of  the  supply 
unless  forced  to  ground  through  a  path  of  conducting  devices.  The  remaining  nodes  in  the  circuit  are 
classified  as  normal  nodes.  These  are  the  weakest  nodes  as  they  cannot  force  their  signals  on  a  stronger 
node  but  are  capable  of  storing  a  signal  dynamically. 

In  the  context  of  switch-level  timing  simulation,  as  defined  in  Section  2.5  of  this  thesis,  the  user  is 
only  interested  in  obtaining  p-state  digital  equivalents  of  the  analog  waveforms  at  various  nodes  in  the 
circuit  over  a  certain  time  interval  [to ,  tfl  Clearly,  the  larger  the  number  of  states,  the  better  is  the 
level  of  detail  provided,  and  thus,  the  more  useful  is  the  information  to  the  user.  It  is  also  clear  that 
using  an  analog  simulator  to  obtain  the  analog  waveforms  and  then  converting  them  to  p-state  digital 
equivalents  is  highly  cost-ineffective  for  large  integrated  circuits.  Hence,  it  is  desirable  to  generate  the 
required  digital  equivalents  directly  via  p-state  digital  simulation.  However,  the  complexity  of  digital 
simulation  dramatically  increases  with  the  number  of  states  p,  particularly  in  the  context  of  generating 
accurate  timing  estimates.  The  choice  of  p=2  must  be  rejected  outright,  since  in  this  case  only  binary 
(Le.,  0  or  l)  waveforms  are  produced.  Binary  waveforms  contain  no  information  whatsoever,  on  the 
slopes  of  the  corresponding  analog  waveforms,  the  presence  of  glitches,  or  other  information  which  is 
often  useful  to  a  designer  when  evaluating  the  performance  of  a  circuit.  In  our  model  therefore,  we 
use  three  states  (l-e.,  p=3)  to  describe  the  values  of  digital  signals,  which  seems  to  be  a  fair  compromise 
between  the  level  of  detail  and  the  generation  of  accurate  SLT  estimates.  Thus  at  any  time  t€[t®  ,  %1 
the  three-state  (or  ternary)  digital  signal  Xn(t)  at  node  n€N  is  related  to  its  analog  counterpart  Vn(t) 


as  follows : 
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Xn(t)  = 


o  <  — >  0.0<Vn(t)^VL 
u  <— :>  VL<Vn(t)<VH 
1  <— >  V„<Vn(t)<VDD 


(3.1) 


where  VL  and  VH  are  two  thresholds  chosen  such  that  0.0<VL<VH<VDD.  Here,  u  is  an  intermediate 
state  between  the  steady  low  and  high  states  0  and  1  used  to  represent  signals  in  transition,  model 
slopes  of  changing  analog  waveforms,  detect  spurious  glitches  and  hazards,  etc.  In  our  model,  the  inter¬ 
mediate  state  is  not  used  as  an  unknown  or  unde  fined  state  as  the  X  state  in  MOSSIM  [191  but  rather  as 
an  analog  voltage  between  the  two  thresholds  VL  and  VH  and  hence  can  never  be  considered  as  a  steady 
state  0  or  1.  It  is  this  interpretation  of  the  third  logic  level  that  helps  simplify  the  procedure  for 
switch-level  simulation  as  will  be  seen  later  in  Chapter  4.  The  ternary  state  X„(t)  of  a  node  n€N  at 
some  time  tCft^tf]  will  be  denoted  simply  by  Xn  whenever  there  is  no  ambiguity  in  time. 


The  ternary  algebra  used  to  manipulate  the  discrete  signals  is  an  extension  of  the  binary  Boolean 
algebra.  The  ternary  algebra  is  an  algebra  defined  on  the  set  L={0,u,lj ,  with  three  basic  operationsof 
AND  (A),  OR  (V),  and  INVERSE  (-0.  For  any  x,y€L,  the  operations  of  AND  and  OR  are  defined  as 
follows : 
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and  for  any  x€L  its  INVERSE  ->x  is  defined  as  follows : 


Clearly,  L  is  closed  under  all  three  operations  and  the  system  (L,A,V,-»)  forms  a  distributive  lattice  [39] 
with  zero  element  0  and  universal  element  1.  Most  of  the  properties  of  Boolean  Algebra  are  preserved 
in  the  ternary  algebra,  except  for  the  Law  of  Excluded  Middle  since  uV-»u  =  u  ^  1  and 
uA-’U  =  u^0. 

An  NMOS  transistor  is  modeled  as  a  three-terminal  device  with  a  switch  between  the  drain  and 
source  terminals  and  the  signal  at  the  gate  controlling  the  status  of  the  switch.  In  this  dissertation  we 
will  only  consider  transistors  whose  drain  and  source  nodes  are  different.  In  some  technologies  the 
drain  and  source  regions  of  a  transistor  may  correspond  to  the  same  net  in  the  layout  as  a  means  of 
implementing  a  variable  resistance.  We  shall,  however,  exclude  such  networks  from  our  modeL  Asso¬ 
ciated  with  each  device  is  a  resistance  which  is  a  primarily  a  function  of  the  ratio  of  the  physical 
length  to  width  (L/W)  of  the  device  when  laid  out  There  are  two  types  of  NMOS  transistors,  namely, 
the  enhancement  type  and  the  depletion  type.  Enhancement  devices  are  characterized  by  positive  dev¬ 
ice  threshold  voltages  (ue,  VTO  >  0)  and  behave  as  voltage-controlled  switches.  Depletion  devices,  on 
the  other  hand,  have  a  negative  VTO  and  are  mainly  used  to  implement  pullup  resistors.  The  gate  and 
source  nodes  of  a  depletion  device  are  usually  shorted  resulting  in  a  two-terminal  resistor.  In  the  case 
of  an  enhancement  NMOS  device,  the  switch  between  drain  and  source  nodes  is  open,  closed,  or  in  an 
intermediate  state  depending  on  whether  the  signal  at  the  gate  node  is  a  0,  1,  or  u,  respectively.  In  the 
case  of  a  depletion  device,  the  switch  is  always  closed  irrespective  of  the  signal  at  the  gate  node.  Alge¬ 
braically,  each  transistor  m€M  has  a  state  Zm€{0,u,l},  where  0  indicates  open,  u  indicates  intermedi¬ 
ate,  and  1  indicates  closed.  Although  the  transistor  states  and  the  node  states  are  different  physical 
phenomena,  the  same  mathematical  objects  will  be  used  to  represent  both. 


Mathematically,  the  iVMOS  network  II(N,M)  can  be  specified  by  giving  a  listing  of  nodes  in  N 
and  transistors  in  M  and  the  following  functions : 


NODTYP: 

N -*  {input  ,ptdlup  /wrmal  } 

the  node  type 

TRNTYP : 

M  — » { enhancement  depletion  } 

the  transistor  type 

GATE: 

M-N 

the  gate  node 

SOURCE: 

M-»N 

the  source  node 

DRAIN : 

M-*N 

the  drain  node 

CAP: 

N-*[Cmm  ,  Cmtx] 

the  node  capacitance 

RES: 

M  *LRmin  t  ^max^ 

the  transistor  resistance 

At  any  instant  in  time  the  state  of  the  network  is  represented  by  CttXJZ)  where  X={XD  :  n€N} 
and  Z={Zm  :  m€M}  with  Xn  ,  Zm€{0,u,ll  representing  the  ternary  states  of  node  n  and  transistor  m 
at  that  time  instant.  Under  stable  or  steady-state  conditions,  the  transistor  states  Z  are  functions  of 
node  states  X.  For  example,  consider  a  transistor  m  with  gate  node  n.  i.e„  GATE(m)  =  n.  If 
TRN'TYPim)  =  enhancement ,  then  Zm  =  Xn  in  the  steady-state,  otherwise  if 
TRNTYP(m)  =  depletion ,  then  Zm  —  1  always. 

3.2  Network  Partitioning 

In  this  section  we  describe  the  strategy  and  algorithms  to  partition  the  NMOS  network  fl(N,M) 
into  several  transistor-disjoint  subnetworks  QifQ2,  *  *  *  ,0^  where  each  subnetwork  or  block  fij  has  a 
certain  special  configuration  that  would  aid  the  simulation  process.  The  partitioning  strategy  is  basi¬ 
cally  to  divide  the  set  of  enhancement  transistors  into  two  types,  namely,  driver  transistors  and  pass 
transistors.  The  transistors  of  a  particular  type  are  then  grouped  together  to  constitute  a  subnetwork  or 
a  block  if  they  have  a  common  DC-path  between  their  source  and  drain  nodes  (a  notion  that  will  be 
made  precise  in  Section  3.2.2).  The  key  to  deciding  whether  an  enhancement  transistor  is  a  driver 


transistor  or  a  pass  transistor  is  in  the  notion  of  an  external  node  which  will  also  be  defined  in  Section 
3.2.2.  It  is  much  easier  to  formally  present  our  ideas  and  concepts  if  the  NMOS  network  is  viewed  as 
an  undirected  graph;  therefore  we  begin  by  reviewing  some  basic  fundamentals  from  graph  theory  for 
the  sake  of  completeness  and  also  for  the  benefit  of  readers  who  are  not  familiar  with  the  subject.  An 
excellent  reference  on  the  fundamentals  of  graph  theory  is  a  book  by  Bondy  and  Murty  [50]. 

3.2.1  Review  of  Graph  Theory 

An  undirected  graph  H  is  an  ordered  triple  (V  (H  ).£  (H  ),  consisting  of  a  nonempty  set 
V  {H  )  of  vertices,  a  set  £  (H )  of  edges,  that  is  disjoint  from  V  {H  ),  and  an  incidence  function  tf/H 
which  associates  with  each  edge  of  H  an  unordered  pair  of  (not  necessarily  distinct)  vertices  in  H .  If 
e  is  an  edge  and  v  and  w  are  vertices  such  that  \)tH  (e  )  =  <v  ,w  >,  then  e  is  said  to  Join  v  and  w ,  the 
vertices  v  and  w  are  called  the  ends  of  e ,  and  moreover,  v  and  w  are  said  to  be  ad  jacent  in  H .  In 
this  case  we  will  usually  refer  to  the  edge  e  as  simply  <v,w  >.  The  set  of  all  vertices  in  H  that  are 
adjacent  to  the  vertex  v  is  denoted  by  Ad  jH  (v ).  The  two  ends  of  an  edge  are  incident  with  the  edge 
and  vice  versa.  If  the  two  ends  of  an  edge  are  the  same,  then  the  edge  is  called  a  loop,  otherwise  it  is  a 
link.  The  symbols  viH )  and  eiH  )  are  used  to  denote  the  number  of  vertices  and  edges  in  graph  H 
respectively,  i.e,  v(H  )  =  |V  (H  )|  and  )  =  \E  (H  )j.  When  only  one  graph  is  under  discussion  it 
will  be  denoted  by  H ,  and  we  will  use  V  ,  £  ,  v,  and  €  instead  of  V  (H  ),  £  ( H  ),  v(H  ),  and  e(f/  ). 
.An  undirected  graph  is  usually  represented  pictorially  on  a  plane  by  associating  one  point  (or  a  dot)  for 
each  vertex  and  joining  two  points  by  a  line  (not  necessarily  straight)  if  the  corresponding  vertices  are 
joined  by  an  edge.  As  an  example,  consider  a  graph  H  with 

V  (H  )  =  { v  „v  2,v  3,v  «,v 5} 

E(H)  = 

and  the  incidence  function  defined  by 
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</>#(e,)=  <v1,v2>,  \I>h  (e2)  ~  <v2,v<>,ij(H(e?)=  <v4.v3> 

\l iH(.e4)~  <v3,Vj>,  0H(e5)  =  <V2.v2>,^(e6)=  <vltv4> 

The  pictorial  representation  of  this  graph  is  shown  in  Figure  3.1.  The  point  representing  the  vertex  v5 
is  isolated  in  the  picture  since  there  are  no  edges  incident  on  this  vertex  in  this  case.  Hence  vertices 
with  no  edges  incident  on  them  are  called  isolated  vertices.  Henceforth,  we  shall  refer  to  a  graph  by  its 
pictorial  representation. 

A  graph  F  is  a  subgraph  of  H  ,  written  as  F  QH ,  if  V  (£  )CV  ( H  ),  E  ( F  )C£  (H ),  and  ipjr  is  a 
restriction  of  0#  to  £  (F  ).  If  V '  is  a  subset  of  V  ,  then  the  subgraph  of  H  whose  vertex  set  is  V  and 
whose  edge  set  is  the  set  of  all  edges  of  H  that  have  both  ends  in  V’ '  is  called  the  induced  subgraph  of 
H  by  V  ’  and  is  denoted  by  H  [V  '1  The  induced  subgraph  H  [V  \V  ’I  denoted  by  H  - V  *,  is  the  sub¬ 
graph  obtained  from  H  by  deleting  the  vertices  from  V '  along  with  all  their  incident  edges.  If  £ '  is  a 
nonempty  subset  of  £ ,  then  the  subgraph  of  H  induced  by  £  ‘  is  the  one  with  vertex  set  as  the  set  of 
the  ends  of  edges  in  £ 1  and  edge  set  £  and  is  denoted  by  H  [£  The  subgraph  obtained  from  H  by 
deleting  the  edges  in  £  ’  is  denoted  as  H  — £ '.  It  must  be  pointed  out  that  deleting  vertices  from  a 
graph  involves  deleting  incident  edges  also;  however,  deleting  edges  involves  only  the  removal  of  edges 
while  leaving  the  set  of  vertices  intact,  i.e,  V  (H  — £ ')  =  V  (if  ).  Similarly,  H  +£ '  is  a  graph  obtained 
from  H  by  inserting  a  new  set  of  edges  £  ’  which  are  disjoint  from  the  old  set  of  edges  £  {H  ).  Again, 
in  this  case,  the  ends  of  the  edges  in  £ 1  must  necessarily  be  in  V(H)  since  no  new  vertices  are  added.  If 
F  and  H  are  two  undirected  graphs  then  their  union  is  a  graph,  denoted  by  F  Uif  ,  whose  vertex  set 
is  V  (£  )U V(H  )  and  whose  edge  set  is  £(£  )U E(H  ).  If  F  and  H  are  disjoint  graphs,  then  their 
union  is  usually  denoted  by  F  +H .  The  degree  d^iv)  of  a  vertex  v  in  H  is  the  number  of  edges 
incident  on  v ,  with  each  loop  counting  as  two  edges. 

A  walk  in  an  undirected  graph  £  is  a  finite,  nonempty  sequence  W  =  v0e1v,e2v2  •  •  •  ek  vk 
whose  terms  are  alternately  vertices  and  edges  in  H  such  that  for  each  1  the  ends  of  e,  are  v,_j 

and  v, .  In  this  case  W  is  said  to  be  a  walk  from  v0  to  vk ,  or  a  (v0  ,vt  )-path  in  H ,  and  the  integer  k  is 


called  the  length  of  the  walk.  The  vertices  v0  and  v;  are  called  the  origin  and  terminus  of  the  walk 

respectively,  while  the  vertices  v  ,,v  2 . vt  _j  are  its  internal  vertices.  If  all  the  vertices  in  a  walk  are 

distinct  then  it  is  said  to  be  a  path.  Usually,  the  subgraph  of  H  whose  vertices  and  edges  are  terms  of  a 
path  is  also  referred  to  as  a  path.  A  walk  is  closed  if  it  has  positive  length  (ue,  k  >0)  and  its  origin  and 
terminus  are  the  same.  A  closed  walk  whose  origin  and  internal  vertices  are  distinct  is  a  cycle ;  just  as 
with  paths  we  sometimes  use  the  term  "cycle"  to  denote  the  graph  corresponding  to  the  cycle.  Two  ver¬ 
tices  v  and  w  of  H  are  said  to  be  connected  if  there  exists  a  (v  ,w  )-path  in  H .  A  subgraph  F  is  a 
component  of  H  if  it  is  a  maximal  induced  subgraph  such  that  any  two  of  its  vertices  are  connected. 
If  H  has  only  one  component  then  H  is  connected,  otherwise,  it  is  disconnected.  The  number  of  com¬ 
ponents  of  H  is  denoted  by  w (H  ). 

3.2.2  Driver  and  Pass  Transistors 

We  begin  this  section  by  intuitively  explaining  the  difference  between  driver  and  pass  transistors 
through  some  examples.  We  then  formally  present  our  strategy  to  decide  whether  an  enhancement 
device  in  the  network  is  a  driver  transistor  or  a  pass  transistor  and  present  an  algorithm  to  achieve  this 
in  linear  time.  Finally,  we  show  how  the  nodes  and  transistors  in  a  network  can  be  partitioned  into 
various  subnetworks  or  blocks,  where  each  block  could  be  one  of  three  types,  namely,  input  sources 
(SRC),  a  collection  of  driver  transistors  along  with  a  depletion  device  (MFB),  or  a  collection  of  pass 
transistors  (PTB). 

Before  going  into  the  formal  definitions,  we  would  like  to  provide  the  reader  with  some  intuition 
on  deciding  between  driver  and  pass  transistors  in  a  network.  We  define  external  nodes  to  be  the  set  of 
nodes  of  "input"  strength  apart  from  the  ground  node  together  with  those  nodes  of  "normal"  strength 
that  are  either  gate  nodes  of  enhancement  transistors  or  are  user-requested  output  nodes.  Now  consider 
a  graph  on  the  nodes  of  an  NMOS  network  with  an  edge  between  the  drain  and  source  nodes  of  each 
enhancement  transistor.  Let  us  focus  our  attention  on  a  pullup  node,  say  np  in  the  graph.  For  each 


such  pullup  node  we  consider  the  subnetwork  composed  of  the  depletion  device  connected  to  the  pullup 
node  and  the  transistors  corresponding  to  all  the  paths  between  nP  and  the  ground  node.  If  all  the 
nodes  corresponding  to  the  internal  vertices  in  each  of  these  paths  are  of  "normal"  strength  and  if  none 
of  these  nodes  is  an  external  node,  we  can  then  define  the  above  subnetwork  to  be  a  multi-functional 
block  (MFB)  and  all  the  enhancement  transistors  in  it  as  driver  transistors.  Furthermore,  each  MFB 
must  contain  a  unique  pullup  node.  Consider  an  example  of  an  NMOS  network  shown  in  Figure  3.2(a) 
and  the  corresponding  graph  in  Figure  3.2(b).  From  the  above  definition,  clearly  m3  is  a  driver  transis¬ 
tor.  The  transistors  m,  and  m2  are  also  drivers  since  the  internal  node,  n4,  is  of  "normal"  strength  and 
is  not  an  external  node.  The  node  n3  is  an  external  node  by  definition  and  hence  m4  and  m$  are  not 
driver  transistors.  In  fact  m4,  m5,  and  m*  are  pass  transistors.  The  MFB  corresponding  to  the  pullup 
node,  n2,  in  this  example,  is  the  subnetwork  consisting  of  the  depletion  transistor  m8  along  with  the 
driver  transistors  m,,  m2,  and  m3.  The  subnetwork  composed  of  the  pass  transistors  m4, 1115,  and  %  is 
called  a  pass  transistor  block  (PTB).  As  far  as  switch-level  simulation  is  concerned,  an  MFB  can  be 
treated  as  a  switching  network  of  driver  transistors  between  the  pullup  node  and  the  ground  node. 
,\ote,  by  definition,  the  only  node  that  is  stronger  than  the  pullup  node  in  such  a  switching  network  is 
the  ground  node.  Furthermore,  one  need  not  compute  the  waveforms  at  any  of  the  internal  nodes  of 
the  switching  network.  Therefore  the  signal  at  the  pullup  node  of  an  MFB  is  computed  using  a  simple 
technique  using  internal  node  eliminations,  which  will  be  discussed  in  Section  4.2.2  in  Chapter  4.  In 
fact,  as  we  shall  see  in  Chapter  4,  the  steady-state  signal  at  the  pullup  node  of  an  MFB  is  simply  a 
Boolean  function  of  the  signals  at  the  gate  nodes  of  its  driver  transistors.  For  example,  in  the  circuit  of 
Figure  3.2(a)  the  signal  at  the  node  n2  is  -^(xj  Ax2)Vx3),  where  xlf  x2,  and  x3  are  the  signals  at  the 
gate  nodes  of  transistors  m2,  and  m3,  respectively.  In  other  words,  an  MFB  can  be  considered  to  be 
a  single  output,  multiple  input  logic  gate.  The  switch-level  simulation  of  a  PTB,  however,  is  a  more 
difficult  task  since  one  needs  to  compute  the  signals  at  each  node  within  the  PTB.  Therefore,  the  algo¬ 
rithms  used  to  simulate  a  PTB  are  much  more  complex  than  the  ones  used  to  simulate  an  MFB,  and 
these  will  be  discussed  in  Section  4.2.3  in  Chapter  4.  Also,  the  techniques  we  will  use  to  delay  the 


f*1  Pull  Up  Node 
(•)  External  Node 


Figure  3.2(a) :  An  NMOS  circuit  with  external  nodes 

(b) :  The  graph  representing  the  circuit  in  part  (a) 


signal  transitions  at  the  pullup  node  of  an  MFB  are  different  from  those  we  will  use  for  the  nodes  of  a 
PTB.  Hence  we  choose  to  differentiate  between  driver  and  pass  transistors. 

The  above  definition  for  a  driver  transistor  is.  in  fact,  only  a  sufficient  condition  satisfied  by 
driver  transistors  as  the  following  example  demonstrates.  Consider  the  NMOS  network  shown  in  Fig¬ 
ure  3.3(a),  and  the  corresponding  graph  in  Figure  3.3(b).  In  this  example  n4  is  an  external  node  by 
definition.  Let  us  suppose  n3  is  simply  a  node  of  "normal"  strength  and  is  not  an  external  node.  In  this 
case  the  path  consisting  of  m4  and  m5  would  satisfy  the  above  definition  of  driver  transistors  and 
hence  these  transistors  would  be  included  in  the  MFB  with  pullup  node  n2.  However,  one  needs  to 
compute  the  signal  at  n4  since  this  determines  the  switching  state  of  transistor  m7,  and  in  order  to  do 
this,  we  need  to  compute  the  signal  at  node  n3  which,  by  the  above  definition,  is  an  internal  node  of  an 
MFB.  We  therefore  have  to  modify  our  definition  of  a  driver  transistor.  To  this  end,  we  introduce  the 
concept  of  a  pseudo-external  node.  A  node  of  "normal"  strength  is  said  to  be  a  pseudo-external  node  if 
it  can  be  connected  to  an  external  node  by  a  path  that  does  not  contain  a  pullup  node  or  the  ground 
node.  Clearly,  the  signals  at  the  pseudo-external  nodes  have  to  be  computed  in  order  to  compute  the 
signals  at  the  external  nodes  of  "normal"  strength.  Hence  such  a  node  cannot  be  an  internal  node  of  an 
MFB.  We  therefore  modify  our  definition  of  driver  transistors  to  be  the  transistors  in  those  paths 
between  a  pullup  node  and  ground  that  do  not  contain  an  external  or  pseudo-external  node.  Thus 
transistors  m4  and  ms  in  the  example  in  Figure  3.3(a)  are  not  driver  transistors.  The  above  modifica¬ 
tion  is,  however,  still  inadequate  to  be  a  necessary  condition  to  be  satisfied  by  driver  transistors  as  it 
does  not  agree  with  our  intuition  in  the  following  example.  Consider  the  NMOS  network  shown  in 
Figure  3.4(a)  and  the  corresponding  graph  in  Figure  3.4(b).  In  this  case  we  have  two  pullup  nodes, 
namely,  n2  and  n4  and  no  external  or  pseudo-external  nodes  in  the  network.  However,  node  n3  cannot 
be  considered  an  internal  node  in  either  of  the  two  MFB’s  since  its  signal  can  be  influenced  by  either  of 
the  two  pullup  nodes.  Hence  the  transistors  m4,  m5,  and  m*  must  be  treated  as  pass  transistors  in  this 
example.  To  include  this  case  in  our  definition  we  would  have  to  treat  the  other  pullup  nodes  in  the 
network  as  external  nodes  while  we  are  trying  to  determine  the  driver  transistors  between  a  particular 
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Figure  3.3(a) :  .An  N’MOS  circuit  with  pseudo-external  nodes 
(b) :  The  graph  representing  the  circuit  in  part  (a) 
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pullup  node  and  ground.  Thus,  if  we  treat  n4  as  an  external  node,  then  n3  becomes  pseudo-external  and 
hence  we  get  mj  and  m2  as  the  only  driver  transistors  in  the  MFB  corresponding  to  n2.  Similarly,  if 
we  treat  n2  as  an  external  node  we  get  m2  as  the  only  driver  transistor  in  the  MFB  corresponding  to  n4. 

The  purpose  of  the  above  discussions  was  mainly  to  help  the  reader  form  some  kind  of  an  intui¬ 
tive  idea  on  the  difference  between  a  driver  and  a  pass  transistor.  The  above  definitions  were  by  no 
means  precise  and  were  not  meant  to  be  formal  defi  tions.  We  now  develop  a  completely  precise  and 
formal  definition  of  driver  and  pass  transistors  by  introducing  the  notion  of  splitting  a  vertex  in  a 
graph.  Consider  an  undirected  graph  H  (V  £  and  vertex  v  in  the  graph  of  degree  k  >1,  Le, 
dH(v)  =  k  ^  1.  The  vertex  v  is  said  to  be  loop- free  if  there  are  no  loops  incident  on  v.  The  entire 
graph  is  loop-free  if  all  its  vertices  are  loop-free,  i^,  it  has  no  loops  as  edges.  A  graph  is  said  to  be  iso¬ 
lated  if  all  its  vertices  are  isolated,  i-e^  it  has  no  edges. 

Definition  3.1  :  Let  v  be  a  loop-free  vertex  of  degree  k  ^  1  in  an  undirected  graph  H .  The  v-split 
graph  or  the  graph  obtained  on  splitting  v  in  H ,  is  a  graph  obtained  by  splitting  the  vertex  v  into  k 
new  vertices  y  j  ,y  2  ,  •  •  •  ,yt ,  with  each  edge  formerly  joining  the  vertex  v  to  w,  now  joining  y,  to  w, . 
We  denote  the  v  -split  graph  as  //  *v .  More  formally  we  can  define  the  v  -split  graph  of  AT  as 

//•v  =  (Ltf  -v  )  U  Y  )  +  (3.2) 

where  Y  denotes  an  isolated  graph  on  the  k  new  vertices  |y1,y2,---  ,y\  )  and 
£w  =  I  <w,  ,y j  >  :i  =1,2,  ■■■£).  Thus  splitting  a  vertex  creates  a  new  graph  with  k  —1  more  ver¬ 
tices  but  with  the  same  set  of  edges.  This  is  in  contrast  to  the  notion  of  adding  new  edges  to  a  graph  in 
which  case  the  vertex  set  is  unaltered  while  new  edges  are  added  to  the  graph.  It  can  easily  be  seen 
that  if  k  =1,  then  splitting  the  vertex  v  does  not  alter  the  graph,  i.e,  H  *v  =  H  if  dH  (v  )=1.  Simi¬ 
larly,  the  notion  of  vertex  splitting  can  be  extended  to  include  the  case  k  =0  by  defining  H  #v  =  H  if 
dH  (v  )=0.  If  V  =  {vj  ,v2 ,  •  *  •  ,v  }  is  a  subset  of  loopfree  vertices  in  H  then  the  V  '-split  graph  of  H 


can  be  defined  as  follows : 


HmV‘  =  (  •••((# •v1)«v2)  •  •  •  )*vv . 


(3.3) 


H  *3”  is  well-defined  since  the  order  in  which  the  vertices  of  V '  are  split  does  not  matter.  The  end 
result  is  always  the  same.  As  an  example  consider  the  graph  shown  in  Figure  3.5(a).  The  graph 
obtained  by  splitting  the  vertices  v2  and  v2  is  shown  in  Figure  3.5(b). 

An  undirected  graph  H  represents  a  network  Q  if  there  is  a  vertex  in  H  corresponding  to  each 
node  in  the  network  and  an  edge  between  two  vertices  if  the  corresponding  nodes  are  the  source  and 
drain  nodes  of  some  enhancement  transistor.  Let  Mg  and  MD  denote  the  sets  of  enhancement  and 
depletion  transistors  in  the  network  respectively.  We  can  then  formally  define  a  graph  representing  a 
network  as  f ollows  : 

Definition  3.2  :  An  undirected  graph  H  (V  ,E  A>h  )  is  said  to  represent  an  NMOS  network  ft(NJM)  if 
there  exist  bijections  &V  -*N  and  <f>:E  -»M£  such  that  (e )  =  <  v  ,w  >  if  and  only  if 
{«v  ),0(w )}  =  (DRAINS  )XSOURCE(0(e ))}. 

Theorem  3.1  :  If  H  represents  an  NMOS  network  ft,  then  H  is  a  loop-free  graph. 

Proof  :  If  e  is  a  loop  in  H ,  then  it  follows  from  the  above  definition  that 
(Kv )  =  DRAIN(0(e ))  =  SOURCE(0(e ))  where  v  is  the  vertex  incident  with  the  loop.  But  this  is 
impossible  since  this  means  that  the  source  and  drain  nodes  of  some  transistor  are  tied  together  and  we 
do  not  consider  such  networks  in  our  model  as  explained  in  Section  3.1.  Hence  H  has  no  loops.  □ 

Let  N|,  Np  and  Nn  denote  the  sets  of  input,  pullup,  and  normal  nodes  in  the  network  respectively. 
It  must  be  noted  that,  by  definition,  the  ground  node  (GND)  is  treated  as  an  input  node.  Also,  by 
definition,  Np  =  {n€N :  n=SOURCE(m)  for  some  m€MD},  i.e,  every  pullup  node  is  a  source  node  for 
a  depletion  device.  The  fact  that  there  is  a  unique  depletion  device  for  each  pullup  node  follows  from 
the  practices  of  conventional  NMOS  circuit  designers.  Let  N0£Nn  be  the  subset  of  normal  nodes  at 
which  the  user  wishes  to  observe  the  output  waveforms.  Also,  let  Nc  =  {n€NN  :  n=GATE(m)  for 
some  m€ME}  denote  the  set  of  normal  nodes  that  are  gate  nodes  of  enhancement  transistors  in  the  net¬ 
work.  The  nodes  in  NG  are  also  called  controlling  nodes  [22,231  since  these  nodes  control  the  state  of 


the  transistor  switches  in  the  network. 

Definition  3 3  :  The  set  of  external  nodes  is  defined  as 

NE  =  NG  U  No  U  (N|\{GND})  (3.4) 

the  union  of  three  sets,  namely,  the  set  of  normal  nodes  which  are  gate  nodes  of  enhancement  transis¬ 
tors,  the  set  of  user-requested  normal  output  nodes,  and  the  set  of  input  nodes  without  the  ground  node. 

Let  V ,  ,VP  ,V E  denote  the  sets  of  input,  pullup,  and  external  vertices  in  H  corresponding  to  the 
input,  pullup,  and  external  nodes  in  the  network.  Let  H  j  =  H  *V;  be  the  graph  obtained  by  splitting 
the  input  vertices  in  H .  In  other  switch-level  simulators  [19,25,261  the  transistors  in  the  network  are 
partitioned  into  several  groups  where  each  transistor  group  is  simply  a  component  of  H ; .  We  would, 
however,  like  to  further  partition  the  transistors  into  driver  and  pass  transistors.  For  this  purpose  we 
consider  H1?  =  H ,  mV  p  which  is  the  graph  obtained  by  splitting  the  pullup  vertices  in  addition  to  the 
input  vertices  from  H .  The  strength  of  a  vertex  v  in  H  is  the  strength  of  the  corresponding  node 
(Xv  )  in  the  network  Q.  Splitting  a  vertex  retains  the  strength,  i.e,  the  strength  of  the  new  vertices  is 
the  same  as  that  of  the  original  vertex  before  splitting.  Also,  splitting  a  vertex  in  a  graph  does  not 
change  the  set  of  edges.  Let  CH  denote  the  subgraph  of  H  induced  by  the  edges  in  £(C )  for  any  com¬ 
ponent  C  of  Hjp .  Note  that  £  (H IP  )=£  (H  )  and  hence  CH  is  well-defined.  Consider  a  component 
C  of  Hip .  Then,  clearly,  CH  satisfies  one  and  only  one  of  the  following  conditions : 

1.  C  H  contains  at  least  one  external  vertex. 

2(a).  C  H  contains  no  external  vertices  and  no  pullup  vertices. 

2(b).  CH  contains  no  external  vertices  and  exactly  one  pullup  vertex. 

2(c).  CH  contains  no  external  vertices  and  at  least  two  pullup  vertices. 

Definition  3.4  :  A  component  C  of  H !P  is  said  to  be  a  driver  component  if  CH  satisfies  condition 
2(b)  given  above. 


Definition  3.5  :  A  component  C  of  H  !P  is  said  to  be  a  pass  component  if  Cn  satisfies  either  condi¬ 
tion  1,  or  2(a),  or  2(c)  given  above. 


A  component  satisfying  condition  2(a),  i.e,  having  no  external  and  no  pullup  vertices,  is  very  rare  since 
this  represents  a  subnetwork  containing  only  normal  nodes,  with  the  possibility  of  the  ground  node 
being  included,  while  none  of  the  normal  nodes  being  gate  nodes  of  enhancement  devices  or  user- 
requested  output  nodes.  Thus,  this  type  of  subnetwork  neither  interacts  with  other  subnetworks  nor  is 
of  any  interest  to  the  user.  For  the  sake  of  completeness,  however,  we  include  this  possibility  also  and 
label  the  component  as  a  pass  component. 

The  edges  in  a  pass  component  are  called  pass  edges  while  those  in  a  driver  component  are  called 
driver  edges.  It  must  be  mentioned,  once  again,  that  splitting  vertices  in  graphs  does  not  alter  the  edge 
set  of  the  original  graph  and  so  we  have  a  partition  of  the  edges  of  H  into  two  sets,  namely,  the  set  of 
pass  edges  EP  and  the  set  of  driver  edges  ED.  We  are  now  ready  to  define  driver  transistors  and  pass 
transistors  in  the  NMOS  network. 

Definition  3.6  :  An  enhancement  transistor  m  in  the  NMOS  network  ft  is  a  driver  transistor  if 
<6_1(m)€££)  and  is  a  pass  transistor  if  0-1(m)€£/.,  where  <b~K m)  =  e  <—>  m  =  0(e  ). 

We  now  form  subgraphs  with  pass  edges  and  driver  edges  and  use  these  to  define  partitions  of  the 
NMOS  network  into  special  subnetworks.  Let  H  1  =  H ;  —  EP  be  the  graph  obtained  by  removing  all 
the  pass  edges  from  the  V}  -split  graph  of  H  and  let  H2  =  H ,  —  ED  be  the  graph  obtained  by  remov¬ 
ing  all  the  driver  edges  from  H ;.  Hence  H 1  contains  only  driver  edges  and  H  2  contains  only  pass 
edges.  The  subgraph  induced  by  the  driver  edges  in  a  component  of  H 1  is  called  a  D-block  of  H  and 
the  subgraph  induced  by  the  pass  edges  in  a  component  of  H  2  is  called  a  P-block  of  H  .  Once  again,  we 
make  no  distinction  between  edges  in  H  and  the  graphs  obtained  by  splitting  its  vertices  since  all  these 
graphs  have  the  same  set  of  edges.  We  thus  have  partitioned  the  graph  H  into  several  edge-disjoint 
subgraphs  H ,  ;  i  =1,2,  •  •  ■  j  where  each  H,  is  either  a  D-block  or  a  P-block.  If  H.  is  a  D-block  then  it 


must  have  a  unique  pullup  vertex  and  no  external  vertices  as  a  consequence  of  its  definition.  This  fact 
and  that  in  conventional  NMOS  designs  a  pullup  node  is  connected  to  a  unique  depletion  device  allows 
us  to  make  the  following  definition.  The  notion  of  an  induced  subnetwork  is  similar  to  that  of  induced 
subgraphs  in  a  graph. 

Definition  3.7  :  A  multifunctional  block  (MFB)  is  a  subnetwork  of  Q  induced  by  the  transistors 
corresponding  to  the  edges  of  a  D-block  in  H  together  with  the  depletion  device  connected  to  its  pullup 
vertex  (node).  An  MFB  is  a  proper  MFB  if  it  also  contains  the  ground  node  (which  incidentally  is  not 
an  external  node  and  hence  does  not  violate  the  above  definition).  In  an  improper  MFB  the  pullup 
node  is  always  stuck  at  1  (L*„  maintains  the  value  of  VDD)  and  hence  we  shall  only  consider  proper 
MFB’s  which  we  will  refer  to  simply  as  MFB.  The  pullup  node  is  the  output  node  of  the  MFB  while 
the  gate  nodes  of  the  driver  transistors  are  its  input  nodes.  The  rest  of  the  nodes,  namely  the  drain  and 
source  nodes  of  the  driver  transistors,  apart  from  the  pullup  node  and  the  ground  node,  are  the  internal 
nodes  of  the  MFE 

Definition  3.8  :  A  pass  transistor  block  (PTB)  is  a  subnetwork  of  Cl  induced  by  the  transistors 
corresponding  to  the  edges  of  a  P-block  in  H .  Once  again,  the  gate  nodes  of  all  the  pass  transistors  are 
input  nodes  to  the  PTB.  The  rest  of  nodes,  namely,  the  drain  and  source  nodes  of  the  pass  transistors, 
could  either  be  input  nodes,  or  output  nodes,  or  both  (sometimes  called  ioputs  for  both  input  and  output 
[7]),  or  none  of  the  above  depending  upon  the  interaction  of  the  PTB  with  the  other  blocks  in  the  net¬ 
work.  If  a  drain  or  source  node  of  a  pass  transistor  is  of  input  strength  it  is  an  input  node  to  the  PTB,  if 
it  is  of  pullup  strength  it  is  an  ioput  (i.e,  both  input  and  output)  node,  and  if  it  is  a  normal  external 
node  it  is  strictly  an  output  node  of  the  PTB. 

The  above  definitions  of  driver  and  pass  transistors  completely  agree  with  the  author’s  intuition 
in  all  cases  considered.  For  example,  consider,  once  again,  the  circuit  in  Figure  14(a).  The  graph  H;j> 
in  this  case,  shown  in  Figure  3.6,  has  three  components.  The  subgraph  C”  in  this  example  is  the  same  as 
the  component  Q  itself,  for  each  i  =  1,2,3.  The  components,  C,  and  C3,  clearly  contain  no  external 


vertices  and  exactly  one  pullup  vertex  and  hence  both  are  driver  components  according  to  Definition 
3.4.  The  component  C2.  however,  contains  no  external  vertices  but  has  two  pullup  vertices  and  is  hence 
a  pass  component  according  to  Definition  3-5.  A  more  detailed  example  is  given  in  Section  3.4. 


3.23  Partitioning  Algorithm  and  Its  Complexity- 

In  this  section  we  will  discuss  the  algorithm  to  partition  the  NMOS  network  into  MFB’s  and 
PTB’s.  Instead  of  dealing  with  the  network  Q(N,M)  we  will  be  concerned  with  the  graph  H  (V  JE ) 
that  represents  the  network.  Obtaining  the  graph  that  represents  the  network  merely  involves  altering 
the  data  structure  that  represents  the  networks  to  the  one  that  represents  a  graph.  Once  we  have  iden¬ 
tified  the  D-blocks  and  P-blocks  in  H  then,  clearly,  identifying  the  MFB’s  and  PTB’s  is  trivial.  Hence 
we  shall  mainly  concentrate  on  the  procedure  PARTITION  given  below  that  partitions  the  graph  H 
into  several  edge-disjoint  subgraphs  and  labels  each  subgraph  as  either  a  D-block  or  a  P-block. 


Algorithm  3.1 

Input :  An  undirected  graph  H  (V  ,£  )  with 
V ,  :  the  subset  of  input  vertices  and 
VF  :  the  subset  of  pullup  vertices. 

V  r  :  the  subset  of  external  vertices. 

Output:  A  set  of  edge-disjoint  subgraphs  E  =  \H  j  tf  2  ,  •  •  •  JKS  I  of  H 
and  a  function  BLK  :  I— i"D  — block '  'P  — block "  |. 

procedure  PARTITION  {H  ) 
begin 

Ed 

EP  -0 

f ‘-SPLITCtf  ,V7) 

F2— SPLITtF^V^) 

<t> -COMPONENTS  2) 
for  each  C}  €<I>  do 
begin 

»£-  vE  nv(cf) 
n,-  Vp  nv(cf) 

if  ( rip  —  1  &  nE  —  0)  then 
Ed-Ed  U  E(Cj) 

else 

Ep-Ep  U EiCj) 

end  if 

end 

I,  -COMPONENTS  l-EP ) 


for  each  H  €Ej  do 

BLK  (//,  )*—*D  -block’ 
Z2  ^COMPONENT! f  1-Ed  ) 
for  each  //,  €£->  do 

BLK  (Hj  )-"£  -block’ 

Z  - 

return  (E JBLK ) 


In  the  above  algorithm  we  must  ensure  that  any  vertex  that  is  split  in  a  graph  is,  in  fact,  loop- 
free.  This  is  indeed  the  case  since  from  Theorem  3.1  we  have  that  the  entire  graph  H  is  loop-free.  Hie 
time  complexity  of  an  algorithm  to  solve  a  problem  is  said  to  be  Oif  (n  ))  if  the  maximum  amount  of 
computation  time  (or  number  of  computation  steps)  taken  by  the  algorithm  is  at  most  cf  (n )  over  all 
inputs  of  size  n ,  where  c  is  some  constant.  The  space  complexity  is  similarly  defined  as  an  upper 
bound  on  the  amount  of  space  required  by  an  algorithm  to  solve  a  problem.  Two  excellent  references 
on  the  subject  of  time  and  space  complexity  of  algorithms  are  Aho,  Hopcroft,  and  Ullman  [5l]  and 
Garey  and  Johnson  [52].  In  most  graph  algorithms  the  input  size  n  is  taken  to  be  |V  |+|£  |,  where  |V  | 
and  | E  |  are  the  number  of  vertices  and  edges  in  the  graph  respectively.  The  time  (or  space)  complexity 
is  said  to  be  linear  if  /  (n )  =  n .  The  following  theorem  demonstrates  that  Algorithm  3.1,  described 
above,  is  of  linear  time  complexity. 


Theorem  3.2  :  The  Algorithm  3.1,  described  above,  correctly  partitions  the  edges  of  H  into  driver 
edges  and  pass  edges  and  its  time  complexity  is  O  (jV  j+|£  |)  where  V  is  the  set  of  vertices  and  £  is  the 
set  of  edges  in  graph  H  . 

Proof  :  The  correctness  of  algorithm  can  easily  be  verified  since  it  partitions  the  edges  of  H  directly 
according  to  Definitions  3.4  and  3.5. 

In  order  to  discuss  the  time  complexity,  we  will  use  the  adjacency  list  [51]  representation  for 
graphs.  This  consists  of  a  list  of  vertices  and  a  linked  list  of  edges.  Each  element  of  the  vertex  list  con¬ 
tains  the  name  (or  label)  of  a  vertex,  say  v ,  followed  by  a  pointer  to  the  location  in  the  edge  list  of  the 
first  edge  incident  on  it.  Each  element  of  the  edge  list  contains  the  name  of  the  vertex  adjacent  to  v ,  an 


edge  label,  followed  by  the  location  of  the  next  edge  incident  on  v,  and  so  on.  A  null-pointer  (0)  indi¬ 
cates  that  there  are  no  more  edges  incident  on  v .  This  is  repeated  for  each  vertex  in  the  graph.  In  case 
of  undirected  graphs  each  edge  <v  ,w  >  appears  twice,  once  in  the  adjacency  list  of  v  and  once  in  that 
of  w.  In  this  case  there  is  a  link  established  between  the  two  locations.  The  total  storage  space  required 
by  this  representation  is  0(|V  |+|£  |). 

The  procedures  SPLIT  and  COMPONENT  are  used  several  times  in  the  above  algorithm.  If  we 
can  show  that  the  time  complexity  of  each  of  these  two  procedures  is  0(|V  |+|£  |),  then  we  are  done 
with  the  proof  since  the  rest  of  the  computations  in  PARTITION  can  easily  be  verified  to  be  of  linear 
time  complexity.  Consider  the  operation  of  splitting  a  vertex  v  of  degree  k  from  a  graph  F .  This 
merely  involves  altering  the  data  structure  to  represent  the  new  graph  and  can  be  easily  shown  to  have 
a  time  complexity  of  Oik).  Thus  SPLIT  (F  ,V ')  is  of  time  complexity  Oiq  )  where  V'QViF),  and 

q  =  £  dF  (v ).  Since  q  <|£  (£  )|  we  have  that  SPLIT  iF  ,V  ’)  is  of  complexity  0(|£  iF  )|).  We  have 
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therefore  established  that  both  SPLIT  (H  ,V ,  )  and  SPLIT  iF  v,V P  )  require  Oi\E  |)  computation  steps, 
where  E  =  E  iH  )  =  E  iF  *).  since  the  splitting  of  a  vertex  from  a  graph  does  not  alter  the  edge  sets. 

The  procedure  COMPONENT  iF )  returns  the  various  components  in  the  graph  F .  A  Boolean 
array  of  the  vertices  is  maintained  to  mark  a  vertex  as  new  or  old,  such  that  every  time  this  array  is 
altered,  a  pointer  exists  to  indicate  the  location  of  the  first  vertex  marked  new.  Initially  all  vertices  of 
F  are  marked  new.  The  procedure  begins  by  starting  from  the  first  vertex  marked  new  and  using  a 
depth-first  search  (DFS)  algorithm  [51]  to  determine  all  the  vertices  connected  to  the  starting  vertex  via 
a  path  in  F .  These  vertices  induce  a  component  and  are  all  marked  old.  The  whole  process  is  repeated 
bv  starting  from  the  first  vertex  that  is  now  still  marked  new  until  all  vertices  are  marked  old.  Each 
application  of  the  DFS  algorithm  returns  the  list  of  vertices  in  a  component  of  £  in  computation  time 
linearly  proportional  to  the  number  of  edges  in  that  component  [51].  Thus  if  one  does  not  have  to  scan 
the  array  to  look  for  a  starting  vertex  marked  new,  which  is  possible  by  maintaining  the  required 
pointer,  the  time-complexity  of  the  entire  procedure  COMPONENT  iF )  is  Oi\E(F  )|).  Since  this  pro- 
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cedure  is  used  thrice  in  PARTITION  (H )  and  each  time  on  a  graph  with  at  most  j E  (H  )|  edges  we  can 
conclude  that  the  time-complexity  of  PARTITION  {.H  )  is  O  (|V  |+|£  |).  □ 

To  model  the  voltage-source  elements  connected  to  the  input  nodes  of  the  network  we  introduce  a 
third  type  of  block  called  input  sources  (SRC)  consisting  of  only  a  node  of  input  strength  (and  no 
transistors).  This  node  is  said  to  be  the  output  node  of  the  SRC.  Thus,  in  this  section,  we  have  shown 
why  and  how  we  partition  an  NMOS  network  Q(N,M)  into  several  subnetworks  where  each  subnet¬ 
work  is  one  of  three  types,  namely,  MFB,  PTB,  or  SRC.  We  have  also  demonstrated  an  algorithm  by 
which  this  partitioning  can  be  achieved  in  computation  time  that  is  at  most  linearly  proportional  to  the 
number  of  nodes  and  transistors  in  the  network.  We  will  use  the  same  symbol  £  to  denote  the  set  of 
partitioned  blocks  in  the  network  and  henceforth  we  shall  refer  to  the  partitioned  NMOS  network  as 
ft(N,M,£)  along  with  a  function  BLK :  £ — ♦  { "MFB*,*PTB ","SRC* }  indicating  the  type  of  block. 
Furthermore,  INP(ftj)  and  OUTtfl;)  will  be  used  to  denote  the  sets  of  input  and  output  nodes  of  sub¬ 
network  <};€£. 


3.3  Ordering  of  Partitioned  Blocks  for  Processing 

Let  (1(N,M,£)  be  the  NMOS  network  that  has  been  partitioned  into  MFB’s,  PTB’s,  and  SRC’s.  We 
will  say  that  the  above  network  has  been  processed  if  the  ternary  digital  waveforms  at  each  external 
node  in  the  network  are  obtained.  The  network  will  be  processed  by  processing  each  of  its  blocks  in  a 
certain  order.  A  block  is  said  to  be  processed,  if  given  the  ternary  waveforms  at  the  input  nodes  to  the 
block,  the  waveforms  at  its  output  nodes  are  obtained.  Thus,  in  order  to  process  a  block,  the  ternary 
waveforms  at  its  input  nodes  must  be  known.  Hence,  we  must  process  the  blocks  in  a  certain  order  so 
that  this  condition  is  always  satisfied  (whenever  possible).  In  this  section  we  will  show  when  such  an 
ordering  exists,  and  if  so,  how  one  obtains  it. 

Definition  3.9  :  For  each  node  n;€N  in  the  network,  let  FOUTtnj)  denote  the  fanout  list  for  the 
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node  which  is  the  set  of  blocks  in  L  having  n;  as  an  input  node,  and  let  FTN(nj)  denote  its  fanin  list 
which  is  the  set  of  blocks  with  n;  as  an  output  node.  Thus, 

FOUTCn;)  =  {Oj :  niCINPfQj)} 

and 

FINK)  =  {ajrniEOintaj)}. 

It  must  be  noted  that  if  n;  is  an  ioput  node  of  a  PTB  then  the  PTB  would  appear  both  in  its  fanin  and 
fanout  lists.  Furthermore,  either  list  could  be  empty  for  certain  nodes;  for  example,  both  lists  would  be 
empty  for  internal  nodes  of  an  MFB.  Let  (0^0^)  denote  an  ordered  triple  ExExN.  The  ordered 
triple  is  said  to  be  an  I/O-triple  if  Qj€FIN(nj)  and  Qk€FOUT(ni).  If  a  node  n;  is  of 

pullup  strength,  i.e.  NODTYPK)  =  pullup,  and  if  it  is  an  ioput  node  of  a  PTB,  then  the  1/O-triple 
(n^n^i ij)  is  said  to  be  a  nonadjacent  I/O-triple.  An  I/O-triple  that  is  not  a  nonadjacent  I/O-triple  is 
said  to  be  an  adjacent  I/O-triple.  It  must  be  emphasized  that  in  the  case  Uj  is  a  pullup  node  that  is  an 
ioput  of  a  PTB,  flj,  then  the  only  nonadjacent  I/O-triple  in  FIN(ni)xFOUT(ni)x{ni}  is  ( the 
remaining  I/O-triples  are  adjacent.  In  this  case,  if  there  is  another  node  n,  that  is  not  of  pullup 
strength,  i-e„  NODTYP(nq)  pullup,  such  that  Oj  appears  both  in  its  fanin  and  fanout  lists,  then  the 
I/Otriple  (Qj.Oj^tq)  is  indeed  an  adjacent  I/O-triple.  The  fact  that  a  pullup  node  can  be  an  ioput  of 
only  one  PTB  follows  from  the  definition  of  the  PTE  Thus  we  have  partitioned  the  set  of  1/0-triples 
into  two  disjoint  categories,  namely,  the  adjacent  ones  and  the  nonadjacent  ones.  Using  the  adjacent 
I/Otriples  in  the  network,  we  will  now  introduce  the  notion  of  a  good  ordering  in  which  the  blocks  of 
a  network  could  be  processed. 

Definition  3.10  :  A  sequential  ordering  R  on  the  blocks  of  a  partitioned  network  Q(N,M,E)  is  a  1-1 
function  R  :  £-*{  1,2,  •  •  •  ,s}  where  s  =  |£|.  The  sequential  ordering  R  is  said  to  be  a  good  ordering  for 
the  network  if  R(Oj)<R(Ok)  for  every  adjacent  I/O- triple  in  the  network.  We  exclude 

nonadjacent  I/O-triples  from  our  definition  since  in  this  case  the  equality  will  be  forced  (and  so  the 


inequality  will  never  be  satisfied)  for  any  sequential  ordering. 

A  good  ordering,  as  defined  above,  is  clearly  a  desirable  ordering  for  processing  the  blocks  in  a 
network,  since  in  this  case,  whenever  a  block  is  scheduled  for  processing,  all  the  blocks  in  the  fanin  lists 
of  each  of  its  input  nodes  have  been  previously  processed,  thus,  providing  input  signals  to  the  this 
block.  A  good  ordering,  however,  may  not  exist  for  some  networks.  As  an  example,  consider  an  MFB 
fik  in  a  network  having  its  output  node  np  connected  back  to  one  of  its  inputs.  In  this  case,  the  net¬ 
work  is  said  to  have  feedback,  and  the  definition  of  a  good  ordering  would  be  violated  by  the  adjacent 
1/O-triple  ( ftk,Dkfnp)  for  any  sequential  ordering.  Hence,  there  is  no  good  ordering  for  such  a  net¬ 
work.  In  the  remaining  part  of  this  chapter  we  will  show  that  a  good  ordering  exists  only  for  net¬ 
works  not  having  any  kind  of  feedback,  and  proceed  to  handle  the  case  of  a  network  with  feedback. 
The  latter  is  important  since  most  of  the  networks  designed  in  present  day  XMOS  technology  do  have 
feedback  in  some  form  or  another,  for  example,  flip-flops,  ring  oscillators,  and  most  clocked  sequential 
circuits  in  general.  To  this  end,  we  will  use  the  notion  of  a  directed  graph  derived  from  a  partitioned 
network.  But  first  we  review  some  basic  concepts  on  directed  graphs  from  Bondy  and  Murty  [50],  for 
the  sake  of  readers  not  very  familiar  with  the  subject. 

3.3.1  Directed  Graphs 

A  directed  graph  G,  often  abbreviated  as  a  digraph,  is  formally  defined  as  an  ordered  triple 
(V(G)A  (G  ),i/rG  )  consisting  of  a  nonempty  set  V  (G)  of  vertices,  a  set,  A  (G  ),  of  arcs  that  is  disjoint 
from  V  (G  ),  and  an  incidence  function  *liG  that  associates  with  each  arc  of  G  an  ordered  pair  of  (not 
necessarily  distinct)  vertices  of  G .  If  a  is  an  arc  and  v  and  w  are  vertices  such  that  <^G  (a  )  =  (v  ,w ), 
then  a  is  said  to  join  v  tow;  v  is  the  tail  of  a ,  and  w  is  its  head  and  the  arc  is  usually  referred  to  as 
simply  (v  ,w  ).  A  digraph  G '  is  a  subdigraph  of  G  if  V  (G  ’)£  V  (G  )*A  (G  ’)C  A  (G  )  and  the  incidence 
function  4>a  is  the  restriction  of  i/rG  to  A  ( G'X  With  each  digraph  G  we  can  associate  an  undirected 
graph  H  on  the  same  vertex  set ;  corresponding  to  each  arc  of  G  there  is  an  edge  of  H  with  the  same 


ends.  The  graph  H  is  said  to  be  the  underlying  graph  of  G .  The  terminology  and  notation  for  subdi¬ 
graphs  are  similar  to  those  used  for  subgraphs.  Just  as  graphs,  digraphs  also  have  a  simple  pictorial 
representation.  A  digraph  is  represented  by  a  diagram  of  its  underlying  graph  together  with  arrows  on 
its  edges,  with  each  arrow  pointing  towards  the  head  of  the  corresponding  arc.  Figure  3.7(a)  shows  a 
digraph  G  and  its  underlying  graph  H  is  shown  in  Figure  3.7(b). 

A  directed  walk  in  G  is  a  finite  nonempty  sequence  W  =  (v ^  ltv  •  •  •  &k  ,vk ),  whose  terms 
alternate  between  vertices  and  arcs,  such  that,  for  each  i  =  1,2,  •  •  ■  Jc  the  arc  a,  has  head  v,  _j  and  tail  v, . 
Directed  paths  and  cycles  are  similarly  defined.  The  vertex  v0  is  called  the  origin  of  the  directed  path 
while  vA  is  its  terminus,  and  the  rest  of  the  vertices  are  called  internal  vertices.  The  integer  k  denotes 
the  length  of  the  directed  path.  Once  again,  the  integer  k  denotes  the  length  of  the  directed  cycle.  A 
directed  cycle  of  length  k  is  referred  to  as  a  k-cycle.  If  there  exists  an  arc  a  in  G  such  that 
\h0(a)  =  (v  ,v  ),  then  a  is  a  loop  in  G  ,  and  v  &  ,v  is  an  example  of  a  one-cycle  in  G .  As  with  paths  and 
cycles  in  undirected  graphs,  we  will  also  refer  to  the  subdigraphs  induced  by  the  arcs  in  a  directed  path 
or  cycle  as  a  directed  path  or  cycle.  Further,  for  convenience,  we  will  drop  the  term  "directed"  and  refer 
to  directed  paths  and  directed  cycles  simply  as  paths  and  cycles. 

A  path  in  G  with  origin  v  and  terminus  w  is  called  a  (v  ,w  )-path.  If  there  is  a  (v,w  )-path  in  G 
then  the  vertex  w  is  said  to  be  reachable  from  v  in  G .  This,  however,  does  not  imply  that  v  is  also 
reachable  from  w .  Two  vertices  v  and  w  are  said  to  be  strongly  connected  in  G ,  denoted  by  v  ~~w ,  if 
each  is  reachable  from  the  other.  Clearly,  —  is  an  equivalence  relation  on  V  (G )  and  it  partitions 

V  (G  )  into  nonempty  subsets  V  j,V  . . V  M,  such  that  if  v  6  V,  and  w  is  strongly  connected  to  v  in 

G ,  then  w  must  also  be  €V,.  The  subdigraphs  G [V  ^G  [V  2], .... G  [V  J  induced  by  the  partition  are 
called  the  strongly  connected  components  of  G .  Note,  by  definition,  a  vertex  v  in  G  is  always 
strongly  connected  to  itself,  i.e,  v  — v  since  one  can  always  choose  a  directed  path  of  length  0  and  reach 
v  from  itself  and  vice  versa .  Thus,  G  [V,  j  is  a  trivial  strongly  connected  component  if  it  contains  only 
one  vertex,  Le^  |V,  |  =  1.  It  can  be  easily  shown  that  if  G  [V,  ]  is  a  nontrivial  strongly  connected  com- 


poneni  of  G .  i^-,  |V,  |  ^  2,  then  it  must  necessarily  contain  a  k  -cycle  with  k  ^2.  Thus,  presence  of 
nontrivial  strongly  connected  components  in  a  digraph  implies  the  presence  of  directed  cycles.  We  use 
m(G  )  to  denote  the  number  of  strongly  connected  components  in  G .  We  say  that  G  itself  is  strongly 
connected  if  n(G )  =  1.  Figure  3.8(a)  shows  a  digraph  which  has  three  strongly  connected  components 
as  shown  in  Figure  3.8(b).  Hence  the  digraph  is  not  strongly  connected,  while  its  underlying 
undirected  graph  is  connected,  since  it  has  only  one  component.  This  clearly  illustrates  the  difference 
between  strongly  connectedness  in  digraphs  and  connectedness  in  undirected  graphs. 

The  in-de%ree  </G~(v  )  of  a  vertex  v  in  G  is  the  number  of  arcs  having  v  as  their  head  vertex. 
Similarly,  the  out-degree  <fG+(v )  of  a  vertex  v  is  the  number  of  arcs  having  v  as  their  tail  vertex.  Just 
as  with  undirected  graphs,  we  shall  use  the  symbols  v(G )  and  e(G )  to  denote  the  number  of  vertices 
and  arcs  in  G  .  We  shall  also  drop  the  letter  G  from  most  of  the  notations  whenever  possible. 

3.3.2  Presence  of  Feedback,  and  its  Detection 

Let  il(N,M,£)  be  a  partitioned  network.  Let  Y  denote  the  set  of  I/O-triples  of  the  network,  i^, 
Y  =  U  FESKnJxFOinXnJxfail,  and  let  Ya  denote  the  set  of  adjacent  I/O-triples  in  Y. 

nj€  N 

Definition  3.11  :  A  directed  graph  G  (V  ,A  ,\Jtc  )  is  said  to  be  derived  from  an  NMOS  partitioned  net¬ 
work  Q(N,M,E)  if  there  exist  bisections  0:E-*V  and  <f>:Ya  -*A  such  that  the  triple 
v  =  ( O  j  .ft*  jx,  )€ Y„  is  an  adjacent  I/O-triple  in  the  network  if  and  only  if  \pG  (<£(v))  =  ((X  fl ;  )M  ). 
Thus  for  every  adjacent  I/O-triple  v  =  (fly,flt  /i, )  of  the  partitioned  network,  there  is  an  arc 
a  =  0(u)  in  the  derived  digraph  G  with  tail  vertex  (X  fl ; )  and  head  vertex  (X04 ).  The  digraph  G  is 
said  to  be  acyclic  if  it  has  no  directed  cycles.  Just  as  with  blocks  in  a  network,  we  have  sequential  ord¬ 
erings  on  vertices  of  a  digraph. 

Definition  3.12  :  A  sequential  ordering  R  on  the  vertices  of  a  digraph  G  is  said  to  be  a  topological 
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Figure  3.8(a) : 

(b): 

A  digraph  G 

The  three  strongly  connected  components  of  G 

ordering  if  for  every  arc  a  with  tail  v  and  head  w  the  strict  inequality  R  (v  )<R  (w  )  is  satisfied. 

Theorem  3.3  :  If  Cl  is  a  partitioned  NMOS  network  and  G  is  its  derived  digraph,  then  the  following 
three  conditions  are  equivalent : 

(1 )  there  is  a  good  ordering  on  the  blocks  of  Cl, 

(2)  G  is  acyclic,  and 

(3)  there  exists  a  topological  ordering  on  the  vertices  of  G . 

Proof  : 

We  shall  first  show  that  (l)  =>  (2).  Suppose  R  is  a  good  ordering  on  the  set  of  blocks  L  of  the 
network.  We  will  show  that  the  derived  digraph  cannot  contain  a  directed  cycle.  Suppose  G  has  a 
directed  k-cvcle.  If  k  =1,  then  there  is  a  loop  a  with  both  ends  at  some  vertex  v .  By  Definition  3.11, 
there  exists  an  adjacent  I/O-triple  <t>~l(a  )  =  where  flj  =  0-1(v  )  in  the  network.  This  adja¬ 

cent  I/Otriple  would  clearly  violate  Definition  3.9  for  R-  If  k  >  1  then  let  v  denote  the  vertex  in  the 
k-cvcle  C  whose  corresponding  block  Q;  =  0-1(v  )  is  ordered  first  by  R  among  blocks  corresponding  to 
the  other  vertices  in  the  cycle,  i.e,  R(0~'(v  ))  <  R(0~Kw  ))  for  all  w  €C .  Since  k  >  1  there  is  an  arc  a 
from  w  to  v  in  the  cycle  (and  hence  in  G  )  with  w  ?*v .  But  this  would  mean  that  there  is  an  adjacent 
I/O-triple  {Cl  Jt Cl  t/i, )  in  the  network  where  Cl =  0-1(w  ),  thus  leading  to  R(H  j )  <  R(  fl*  )  which 
contradicts  the  above  choice  of  the  vertex  v .  Hence  the  proof  by  contradiction. 

The  fact  that  (2)  *>  (3)  is  a  well-known  result  on  digraphs  and  can  be  found  in  most  standard 
textbooks  on  graph  theory,  such  as  [50].  Hence  we  will  only  outline  this  part  of  the  proof.  Suppose  G 
is  an  acyclic  digraph.  Then  there  must  be  a  vertex  of  in-degree  0  in  G ,  since,  if  not,  consider  the  long¬ 
est  directed  path  in  G .  If  the  first  vertex  of  this  path  does  not  have  in-degree  0,  then  either  G  has  a 
cycle  or  a  longer  path.  Hence  pick  a  vertex,  say  v ,  whose  in-degree  is  0.  The  rest  of  the  proof  that  G 
has  a  topological  ordering  is  by  induction  on  the  number  of  vertices  of  G .  The  basis  for  induction  is 
clearly  satisfied  for  all  digraphs  containing  only  one  vertex.  Now  suppose  that  all  acyclic  digraphs  on 


less  than  v  vertices  have  a  topological  ordering.  Let  G  have  v  vertices.  Then  G  — v  has  no  cycles  and 
has  v— 1  vertices,  and  so  must  have  a  toplogical  ordering,  say  R '.  Now  let  R  be  an  ordering  of  G  such 
that  R(v  )=1  and  )=R\w  )+l  for  all  other  vertices  w  in  G  .  Then  clearly,  R  is  a  topological 
ordering  for  G . 

The  fact  that  (3)  *=>  (l)  follows  trivially  from  the  definitions  of  good  orderings  of  E,  topological 
orderings  of  vertices  in  G  and  the  fact  that  G  is  derived  from  Q.  □ 

We  now  introduce  the  concept  of  feedback  in  a  partitioned  network. 

Definition  3.13  :  A  partitioned  NMOS  network  Q  is  said  to  have  feedback  among  its  blocks  if  its 
derived  digraph  G  has  directed  cycles.  Thus  Cl  is  feedback-free  if  G  is  acyclic  and  is  internal 
f eedback- f ree  if  G  has  no  directed  loops.  A  block  Oj€E  is  said  to  have  internal  feedback  if  the 
corresponding  vertex  (X  Q  ] )  in  G  is  incident  with  a  directed  loop. 

It  is  clear  that  this  definition  of  feedback  in  the  networks  conforms  to  the  standard  notion  of 
feedback  in  circuits.  It  should  also  be  clear  now  why  we  only  considered  adjacent  I/O-triples  while 
constructing  the  derived  digraph.  Had  we  chosen  all  I/O-triples  to  create  arcs  in  G  we  would  have 
directed  loops  corresponding  to  every  nonadjacent  I/O-triple.  This  would  then  amount  to  declaring 
that  a  network  has  internal  feedback  simply  because  it  has  a  pullup  node  that  is  an  ioput  of  a  PTB, 
which  does  not  conform  to  our  usual  conception  of  feedback  in  circuits.  We  are  now  ready  to  say  that 
a  network  has  a  good  ordering  if  and  only  if  it  is  feedback-free.  We  state  this  result  without  proof 
below,  since  it  easily  follows  from  Theorem  3.3  and  the  definition  of  feedback-free  networks. 

Theorem  3.4  :  A  partitioned  network  H(N,M,Z)  has  a  good  ordering  on  its  partitioned  blocks  if  and 
only  if  it  is  feedback -free. 

A  good  ordering  of  the  blocks  in  a  feedback-free  network  can  easily  be  obtained  by  first  placing 
the  vertices  of  the  derived  digraph  (which  in  this  case  will  be  acyclic,  by  definition)  in  a  topological 
order  and  then  placing  the  corresponding  blocks  of  the  network  in  the  same  order.  If,  however,  the 


network  bas  feedback  (which  is  the  more  common  case  in  the  present-day  \MOS  designs),  the  derived 
digraph  contains  directed  cycles  and  hence  no  topological  (good)  ordering  exists  on  its  vertices  (blocks). 
In  this  case,  therefore,  one  must  detect  the  blocks  in  the  network  that  are  within  feedback  loops,  treat 
these  as  special  blocks  and  and  place  the  rest  of  the  blocks  in  a  "good"  ordering.  We  formalize  these 
ideas  below. 

Definition  3.14  :  If  V,  is  a  set  of  vertices  in  a  strongly  connected  component  of  G,  then  the 
corresponding  set  £,  =  {0_1(v  )  :  v  €  V, }  of  blocks  in  E  is  defined  to  be  a  strongly  connected  component 
(SCC)  of  the  network.  Thus  we  have  a  partition  Ei>&2»  •  •  • .  of  the  blocks  in  E. 

Let  V  |,V, . V  M  denote  the  partition  of  the  vertex  set  of  the  digraph  G  into  strongly  connected 

components.  We  define  the  condensation  of  G  to  be  a  digraph  G  consisting  of  vertices  w  ,,w2 . 

with  an  arc  having  head  w,  and  tail  w ]  if  and  only  if  i  &  j  and  there  is  an  arc  in  G  with  head  x  6  V’, 
and  tail  y  €  V^.  Consider  the  digraph  G  shown  in  Figure  3.8(a).  Its  condensation  G  ,  shown  in  Figure 
3.9,  is  clearly  acyclic.  We  will  show  that,  for  any  digraph  G ,  its  condensation  G  is  acyclic  ind  hence, 
f rom  Theorem  3.3,  it  has  a  topological  ordering,  wh'Ch  corresponds  to  an  ordering  of  the  SCCs  of  E.  To 
this  end  we  need  the  following  intermediate  result. 

Lemma  :  If  C  denotes  a  directed  cycle  in  the  digraph  G  then  all  its  vertices  must  be  within  a  strongly 
connected  component  of  G  . 

Proof  :  (See  [pOJ).  Consider  any  two  vertices,  say,  x  and  y  in  V  (C ).  Since  C  is  a  cycle,  there  is  a 
directed  path  from  x  to  y  and  also  a  return  path  from  y  to  x  in  C.  But  C  is  a  subdigraph  of  G  and 
hence  x  is  reachable  from  y  and  y  is  reachable  from  x  in  G .  Therefore,  by  definition  x  and  y  must 
be  in  the  same  strongly  connected  component.  □ 

Theorem  3.5  :  The  condensation  G  of  any  digraph  G  must  be  acyclic. 

Proof  :  (See  [50,53]).  By  definition,  G  has  no  directed  loops,  and  so  has  no  one-cycle.  If  C  is  a  k-cycle 
in  G  with  k  >  1,  then  the  vertices  of  G  in  the  set  (J  V,  must  belong  to  a  directed  cycle  and  hence. 


from  the  above  lemma,  must  all  be  in  one  strongly  connected  component,  which  is  a  contradiction.  □ 


Our  strategy  to  schedule  the  blocks  of  E  for  processing  is  to  start  by  detecting  the  strongly  con¬ 
nected  components  in  the  derived  digraph  G .  We  then  obtain  the  condensation  of  G  and  proceed  to 
find  a  topological  ordering  in  G .  This  then  corresponds  to  some  ordering  on  the  SCCs  of  E.  The  pro¬ 
cessing  of  the  network  Q  then  begins  by  processing  the  SCC  ordered  first,  followed  by  the  one  ordered 
second  and  so  on.  An  SCC  is  said  to  be  simple  if  it  contains  only  one  block  of  E  and  that  block  has  no 
internal  feedback.  A  simple  SCC  is  processed  by  algorithms  described  in  Chapter  4.  If  an  SCC  is  not 
simple  then  the  blocks  within  it  are  processed  using  special  techniques  described  in  Chapter  6.  The 
algorithm  presented  below,  well-known  as  Tarjan’s  algorithm  [3ll  partitions  the  vertex  set  of  any 
digraph  into  its  strongly  connected  components.  A  vertex  w  is  an  in-neighbor  of  the  vertex  v  in  G  if 
(w  ,v )  is  an  arc  of  G  and  is  an  out-neighbor  of  v  if  (v  ,w )  is  an  arc  of  G .  We  use  Ad  j(fiv )  and 
Ad  y'G+(v  )  to  denote  the  sets  of  in-neighbors  and  out-neighbors  of  the  vertex  v  in  G .  In  Tarjan’s  algo¬ 
rithm  two  integers  k  [v  ]  and  L  [v  ]  are  computed  for  each  vertex  v  in  the  digraph  G ,  known  as  depth- 
first  number  and  lowpoint  [53]  respectively.  A  digraph  G  is  said  to  be  a  rooted  digraph  if  it  contains  a 
vertex,  say  root ,  such  that  all  vertices  in  G  are  reachable  from  root .  In  the  case  of  the  derived  digraph 
G  we  try  and  make  it  a  rooted  digraph  by  inserting  a  new  vertex  called  root  and  directing  arcs  from 
this  new  vertex  to  every  vertex  of  in-degree  0  in  the  original  G .  In  the  original  derived  digraph  G 
every  vertex  corresponding  to  an  SRC  block  in  the  circuit  must  indeed  have  in-degree  0  and  so  the 
above  notion  is  welldefined.  Further,  if  there  is  a  vertex  that  is  not  reachable  from  the  new  vertex 
root  then  it  is  also  not  reachable  from  any  of  the  vertices  corresponding  to  the  SRC  blocks  in  the  net¬ 
work.  This  means  that  the  input  signals  would  never  propagate  to  such  blocks  in  the  network  and  so 
they  need  not  be  simulated.  Hence  we  are  only  interested  in  simulating  those  blocks  in  the  circuit  that 
correspond  to  vertices  that  are  reachable  from  the  vertex  root  in  the  above  new  digraph.  We  will  still 
refer  to  the  modified  derived  digraph  as  G  itself  and  will  assume  that  it  is  a  rooted  digraph. 


Algorithm  3.2 


Input :  A  rooted-digraph  G  (V  ,A  ),  with  a  special  vertex  root . 

Output;  A  partition  of  V  —root  into  strongly-connected  components 
V„V2 . VM. 

procedure  SCC_DETECT  (G ) 
begin 

i  «-l; 

for  each  v  €  V  do 

MARK  [v  ]«-  "new"; 

M*-0 ; 

initialize  STACK  to  empty; 
v  *-root ; 

DFS  (v  >. 

end 

procedure  DFS  (v  ) 
begin 

MARK  [v  ]-  "old"; 
k  [v  ]*-i ; 
i  *-i  +1; 

L  [v  ]*-*  [v  i 
push  v  on  STACK; 
for  each  vertex  w  £  Ad  j^Xv  )  do 
begin 

if  MARK  [w  ]  =  "new"  then 

dfs  (w  y 

L  [v  ]«-MIN  (L[vUkD; 
else  if  k  [w  ]  <  k  [v  ]  and  w  6  ST ACK  then 
L [v ]*-MIN  U[wU[v]); 

end  if 

end 

if  L  [v  ]  =  k  [v  ]  and  v  & root  then 

m-m+i; 

VM-0; 

repeat 

pop  x  from  STACK; 

VM— VMU{*  |; 
until  x  =  v ; 

end  if 

end 

The  above  algorithm  terminates  for  finite  digraphs  and  does  so  with  linear  time  complexity  and, 
furthermore,  correctly  partitions  the  vertices  of  the  digraph  into  strongly  connected  components.  This 
fact  follows  from  the  theorem  below  which  we  state  without  proof.  Its  proof  can  be  found  in  several 
books  on  graph  algorithms  such  as  [31],  [51],  and  [53]. 


Theorem  3.6  :  The  procedure  SCC_DETECT  (G )  partitions  the  vertices  of  V  into  its  strongly  con¬ 
nected  components  correctly  with  time  complexity  of  0  (max(|V  |,jA  |)). 

We  now  describe  an  algorithm  that  creates  a  new  digraph  G  which  is  the  condensation  of  the 
digraph  G .  We  will  use  two  procedures  CREATE  (x )  and  ADD_ARC  (x  ,y )  to  create  vertices  and  add 
arcs  in  the  data  structure  that  represents  G.  The  data  structure  is  the  same  as  that  for  undirected 
graphs  explained  in  Section  3.2.3,  consisting  of  a  list  of  vertices,  and  for  each  vertex  an  adjacency  list, 
implemented  as  a  linked  list,  of  the  out-neighbors  of  the  vertex. 


Algorithm  3.3 

Input :  A  digraph  G  (V  ,A )  with  a  partition  V  ltV'  . V  ^ 

of  its  vertex  set  into  strongly-connected  components. 

A  function  SCCOMP :  V  -*{1,2 . fi\  such  that  for  any 

vertex  v  €  V ,  if  i  =  SCCOMP(v  )  then  v  €  V, . 

Output:  The  condensation  digraph  G  of  G . 

procedure  CONDENSE  (G ) 
begin 

V(G  )*-0; 

A  (G  )«-0; 
for  i  *-l  until  fi  do 
CREATE  (w, ); 

for  each  arc  (x  ,y  )G  A  (G  )  do 

begin 

i  -SCCOMKx  y, 
j  *-SCCOMP(y  y, 
if  i  ^  j  then 

ADD_ARC  (w,  ,Wj  y, 

end  if 

end 

return  G ; 

end 

The  above  algorithm  clearly  is  of  time  complexity  0(j/+e),  where  v  =  |V  (G  )|  and  €  =  \A  (G  )|. 
We  finally  present  an  algorithm  to  produce  a  topological  ordering  on  the  vertices  of  the  digraph  G 
which  is  known  to  be  acyclic  from  Theorem  3J,  and  hence,  from  Theorem  3.3  must  have  such  an  ord¬ 
ering.  This  algorithm  uses  a  QUEUE  to  store  some  vertices.  One  could  also  use  a  STACK  instead  which 
would  result  in  a  different  ordering.  We  use  d~{w )  and  Ad  j  +(w )  to  denote  the  in-degree  and  out- 
neighbors  of  vertex  w  €  V  (G  ),  i.e,  we  drop  the  subscript  G  from  the  usual  notations  for  convenience. 


Algorithm  3.4  [51] 

Input :  An  acyclic  digraph  G  (V  .A ),  with  /*=|V  |. 

Output:  A  1-1  function  R  :  V  — *  { 1,2 . fi)  such  that 

for  every  arc  (w,  )  in  A ,  R  (w,  )<R  (w j ). 

procedure  TOP_ORDER  (G ) 
begin 

k  *-l; 

for  each  vertex  w.  €V  do 
begin 

I  [w,  ] *-d  ~iwt  >, 
if  d  "(w, )  *  0  then 

push  w,  into  QUEUE; 

end  if 

end 

while  QUEUE  is  not  empty  do 
begin 

pop  vertex  from  QUEUE; 
k  —k  +1; 

for  each  vertex  w4  €Ad  j  +(wjl )  do 
begin 

/  [w*  ]— I  [w4  3-1; 

if  l[wk]  =  0  then 

push  w4  into  QUEUE; 

end  if 

end 

end 
return  R; 
end 

The  topological  ordering  R  on  G  provides  us  with  an  ordering  ORD  on  the  set  of  SCCs 
{Ej,E2>  •  •  •  f  such  that  ORDlEj)  =  R  (w, ),  where  w,  is  the  vertex  of  G  corresponding  to  the  SCC 


3.4  An  Example  to  Illustrate  Partitioning  and  Ordering 

In  this  section  we  will  consider  the  NMOS  network  shown  in  Figure  3.10  as  an  example  to  illus¬ 
trate  the  partitioning  and  ordering  algorithms  described  in  the  earlier  sections  of  this  chapter.  This  net¬ 
work  consists  of  17  nodes  N={n<)tn1, . . .  ,n16}  and  20  transistors  M={mlfm2, . . .  ,m20).  The  set 
ME={mi,m2, . . . , mis)  is  the  set  of  enhancement  devices  and  MD={mlb, .... m20l  is  the  set  of  deple- 


tion  devices.  The  set  of  nodes  can  be  partitioned  into  three  classes  according  to  their  strengths,  namely, 
the  nodes  of  "input"  strength 

the  nodes  of  "pullup"  strength 

Np={lVn7»n8^19^1lo)» 
and  the  nodes  of  "normal"  strength 

= { Ilx  i  ,Hl2>n13*n14»n15*n16l  • 

The  node  no  is  the  ground  node  and  nt  is  the  supply  node  to  the  network.  The  set  of  external  nodes  in 
this  case  is 

NE=l®2»n3»n4f11S»nllf,112l- 

The  graph  H  representing  this  network  is  shown  in  Figure  3.11.  We  only  show  the  nonisolated 
vertices  in  the  graph.  Also,  we  refer  to  the  vertices  and  edges  of  the  graph  as  nodes  and  transistors  in 
the  network,  respectively,  for  the  sake  of  convenience,  i.e^  in  this  case  the  bijections  0  and  </>  used  in 
Definition  3.2  are  both  identity  mappings.  The  graph  H;  obtained  by  splitting  the  nodes  of  input 
strength  from  H  is  shown  in  Figure  3.12  and  the  graph  HI?  is  shown  in  Figure  3.13(a).  The  graph 
Hip  has  seven  components.  The  subgraphs  of  H  induced  by  the  edges  in  each  of  these  components  are 
shown  in  Figure  3.13(b).  Among  these  subgraphs,  the  subgraph  Cf  has  two  external  nodes  while  C" 
has  two  pullup  nodes,  and  so  the  corresponding  components  C2  and  C5  are  declared  as  pass  com¬ 
ponents.  The  rest  of  the  components  can  easily  be  verified  to  be  driver  components.  Thus,  the  set  of 
driver  transistors  is 

MD={m,  ^n2>m3>m4^ns>mfa»m7>m8»m^,m  j  0 } 

and  the  set  of  pass  transistors  in  the  network  is 


m,2  ^-y 


V  \-/ 


cy 

FP-8B42 

Figure  3.13(a) : 

(b): 

The  graph  H;  for  //  in  Figure  3.1 1 

The  corresponding  edge-induced  subgraphs  of  H 

8< 


Mp— {m11>mi2»mi3>m14fin15J. 

It  can  easily  be  verified  that  the  subgraphs  C  $  and  C  5  are  the  P-blocks  of  H  and  the  rest  of  the  sub¬ 
graphs  in  Figure  3.13(b)  are  the  D-blocks  of  H .  Thus  the  transistors  in  the  network  can  be  now  parti¬ 
tioned  into  seven  blocks,  two  of  which  are  PTB’s  and  the  remaining  five  are  MFB’s.  We  provide  below 
a  listing  of  the  transistors  in  each  block  along  with  the  set  of  its  input  and  output  nodes.  In  the  case  of 
an  MFB  the  first  transistor  in  its  list  is  a  depletion  load  device. 


Block 


Transistors 

Input  Nodes 

Output  Nodes 

n2*n3 

n* 

n7 

mi8»m4,m5>m(„m7tin8 

n4fn7»fcio^iii,ni2 

n8 

m19*m9 

n5 

n. 

m20»m10 

n,0 

mll*m12 

n3»n4»n<) 

n<>»nii»ni2 

m13»m14»nilS 

n4»nS»I17»nX»n9 

tt8»n9 

In  addition  to  the  seven  blocks  given  above,  the  network  also  has  five  SRCs  which  we  list  below 
along  with  the  node  of  input  strength  in  each  of  them. 


Block 

Output  Node 

SRC, 

n. 

src2 

n2 

src3 

»3 

src4 

n4 

SRQ 

ns 

We  have  thus  partitioned  the  network  into  five  SRCs,  five  MFB’s  and  two  PTB’s.  We  are  now 
ready  to  form  the  fanin  and  fanout  lists  for  each  node  in  the  network.  The  table  below  gives  these 
lists  for  each  node  which  has  both  its  fanin  and  fanout  lists  nonempty. 


Fanin  List 


MFB34>TB2 

mfb4,ptb2 


Fanout  List 


MFB1J>TB1 

MFB2,\IFB>PTB1,PTB2 

mfb4>ptb2 


MFBj^PTB, 


PTBj 

PTBo^VIFBs 
3 
3 
3 


From  the  above  table,  we  see  that,  node  is  an  ioput  of  PTBj  and  nodes  n8  and  are  ioputs  of  PTB2. 
Hence,  out  of  the  22  1/O-triples  we  get  three  nonadjacent  triples,  namely,  (PTB14*TB14ifc), 
( PTB2 JPTB2^i7 ),  and  (PTB2,PTB2fng).  The  remaining  19  triples  are  adjacent  I/Otriples.  Given  the 
adjacent  I/O-triples,  we  can  construct  the  derived  digraph  G  as  shown  in  Figure  3.14.  This  digraph 
contains  ten  vertices  and  19  arcs.  We  also  include  a  vertex  "root"  and  join  it  to  the  five  SRC  vertices  as 
shown  in  the  same  figure.  Using  Algorithm  3.2  on  this  digraph  gives  us  ten  strongly  connected  com¬ 
ponents  which  we  list  below. 


SCC 

Blocks 

El 

SRCj 

X, 

SRC2 

Z3 

src3 

E4 

src4 

£s 

SRCj 

MFB1 

£7 

mfb2 

mfb4 

PTBj 

E10 

MFB3,MFBs,PTB2 

Thus,  £(0  is  the  only  SCC  that  is  not  simple.  Since  G  has  no  self  loops,  the  network  has  no  internal 
feedback.  Note  that  had  we  considered  the  adjacent  I/O-triples  in  constructing  the  derived  digraph,  we 
would  get  self  loops.  The  network,  however,  has  an  SCC  that  is  not  simple,  and  hence,  has  feedback 
among  MFB3,  PTB2,  and  Mm%.  The  condensation  digraph  G  is  shown  in  Figure  3.15.  From  Algo- 


rithm  3.4  on  G  we  get  a  topological  ordering  JR  such  that  J?(w,  )=i  ;  i  =1,2, ....  10.  This  induces  an 
ordering  ORD  on  the  SCCs  of  the  network,  such  that  ORIXLj)  =  i ;  i=l,2, ....  10.  in  this  case. 

3.5  Conclusions 

In  this  chapter  we  began  by  representing  an  XMOS  network  fl(N,M)  as  a  set  of  nodes  N  inter¬ 
connected  by  a  set  of  NM05  devices  M.  We  then  partitioned  the  set  of  enhancement  transistors  into 
driver  transistors  and  pass  transistors.  Following  this,  the  driver  transistors  are  grouped  together  to 
form  MFB’s  while  pass  transistors  are  grouped  together  to  form  PTB’s.  Another  type  of  block,  called 
SRC,  is  introduced  to  model  the  input  voltage  sources  connected  to  the  input  nodes  of  the  network.  The 
partitioned  network  is  represented  as  Q(N,M,£),  where  £  is  the  set  of  partitioned  blocks  which  could 
be  MFB’s,  PTB’s,  or  SRC’s.  We  then  introduced  the  concept  of  feedback  among  blocks  in  the  network 
and  showed  that  a  good  ordering  for  processing  the  various  blocks  is  possible  only  for  feedback-free 
networks.  In  case  the  network  has  feedback,  the  set  of  blocks  is  partitioned  into  its  strongly  connected 
components  (SCCs).  Finally,  we  came  up  with  an  ordering  of  the  SCCs  for  processing.  The  partition¬ 
ing  and  the  ordering  of  the  blocks  have  both  been  shown  to  take  computation  time  that  is  linear  in  the 
number  of  circuit  nodes  and  number  of  devices. 


Let  Q(N,M,E)  be  a  partitioned  NMOS  network  in  which  the  set  of  blocks  £  has  been  further 
partitioned  into  its  strongly  connected  components  (SCCTs)  LitZ2, .  . .  >  ZM.  Let  ORD  denote  the  order¬ 
ing  in  which  the  SCCs  have  been  scheduled  for  processing.  If  an  SCC  is  simple,  Le,  it  consists  of 
exactly  one  block  (an  MFB,  PTB,  or  SRC)  with  no  internal  feedback,  then  it  is  simulated  at  the  switch 
level  by  algorithms  described  in  this  chapter.  By  simulating  or  processing  a  block,  we  mean  obtaining 
the  ternary  digital  waveforms  at  the  output(s)  of  the  block  given  those  at  the  inputs  to  the  block  over 
the  entire  time  interval  of  interest.  In  case  the  SCC  is  not  simple,  a  special  event-driven  windowing 
technique,  to  be  described  in  Chapter  6,  is  used  to  simulate  the  various  blocks  within  the  SCC.  This 
special  technique  partitions  the  entire  time  interval  into  several  windows  and  uses  the  algorithms 
described  in  this  chapter  to  simulate  only  the  active  blocks  within  each  window. 

4.1  Ternary  Signals  and  Sequences  of  Transitions 

Let  (L,V,A,-»)  denote  the  ternary  algebra  on  the  set  L  =  {0,u,l|  with  binary  operations  OR  (V), 
and  AND  (A),  and  a  unary  operation  INVERSE  (->),  as  defined  in  Section  3.1.  Let  [t^t*]  denote  the 
time  interval  in  which  the  network  is  to  be  simulated.  At  each  time  instant,  the  signal  at  a  node  in  the 
network  is  assumed  to  occupy  a  ternary  value  from  L,  Le,  a  0,  u,  or  1,  while  this  value  might  change 
with  time.  Such  a  signal  is  called  a  ternary  signal.  A  node  Hi€N  is  associated  with  a  ternary  digital 
waveform,  denoted  by  Xi(  which  is  a  mapping  X, :  [to,tf]-*L,  such  that  Xj(t)  is  the  ternary  value  of 
the  signal  at  node  nj  at  time  t€[to,tfl  A  transition  in  a  ternary  signal  is  defined  as  a  change  in  the 
ternary  value  of  the  signal  taking  place  at  a  certain  time  instant.  Thus,  to  completely  specify  a 


transition,  we  need  to  specify  both  the  type  of  transition  and  the  time  at  which  it  occurs.  A  transition 
type  is  an  ordered  pair  (x  ,v  )  where  x  ,v  €L  and  x  ^y .  There  are  six  possible  transition  types,  namely, 
(0,u),(u,l),(  l,u),(u,0),(0,l),(  1,0).  In  accordance  with  the  fact  that  a  ternary  digital  waveform  has  a 
corresponding  analog  waveform,  given  by  the  inverse  of  the  transformation  in  Equation  3.1,  only  the 
first  four  out  of  the  six  types  of  possible  transition  types  are  allowed.  These  allowable  transition  types 
are  (0,u),(u,l),(l,u)  and  (u,0).  We  will  consider  only  allowable  transition  types  and,  henceforth,  drop 
the  qualifier  "allowable"  whenever  possible.  For  the  sake  of  convenience  in  implementation,  the  entire 
simulation  time  interval  [t^t*]  is  discretized  by  choosing  a  minimum  resolvable  time  (MRT),  denoted  by 
hmin,  so  that  a  time  point  t  can  be  represented  by  an  integer  k  if  t€[t<,+k*hmm  ,  to+(k+l)*hmaj).  Thus 
two  different  time  points  within  this  interval  are  considered  indistinguishable  and  are  represented  by 
the  same  integer  k  and  vice  versa.  If  K  =  (tf— tol/hn,^,  then  the  time  at  which  a  transition  takes  place 
within  [tfljtf]  can  be  denoted  by  an  integer  k€[K]={ 0,1,2, ....  K}.  The  value  of  hmui  is  usually 
chosen  to  be  very  small,  typically  one  or  two  orders  of  magnitude  smaller  than  the  rise  or  fall  times  of 
the  analog  signals.  We  can  now  represent  a  transition  a  as  an  ordered  triple  (x,y,k)  6  LxLx[K]  where 
(x,y)€LxL  is  the  transition  type  and  k  denotes  the  time  of  its  occurrence.  Furthermore,  x  is  the  initial 
value  of  a  and  y  is  its  final  value. 

Let  S  =  atua2, ....  ap  be  a  sequence  of  transitions  where  each  Q!j=(Xj,y^kJ).  The  sequence  S  is 
said  to  be  chronological  if  ki  <k2  <  —  kr  A  chronological  sequence  is  said  to  be  compatible,  in  addi¬ 
tion,  if  (xj.yj)  is  an  allowable  transition  type  for  each  l^j^p  and  yj=Xj+1  for  each  l^j^p— 1.  In  a 
compatible  sequence  therefore,  the  final  value  of  every  term  in  the  sequence  is  equal  to  the  initial  value 
of  the  succeeding  term. 

Let  tk  =  t0+kxhmin  and  let  Xltlrtett^tf]  be  a  ternary  signal  waveform  such  that  no  more  than 
one  transition  occurs  in  a  time  interval  [tk,tk+1)  for  any  integer  k.  We  will  call  such  a  waveform  a 
proper  waveform.  Clearly  any  ternary  waveform  will  be  a  proper  waveform  if  hmin  is  chosen  as  sug¬ 
gested  above.  Henceforth,  we  will  assume  that  such  an  hmin  has  been  chosen  and  that  all  ternary 


waveforms  are  indeed  proper.  In  a  proper  waveform,  therefore,  if  a  transition  occurs  at  a  real  time 
t€[tk,tk+j)  then  any  other  transition  must  occur  in  some  other  interval  disjoint  from  this.  We  use  the 
notations  t~  and  t+  to  denote  time  points  just  before  and  just  after  the  time  t.  We  represent  a  proper 
waveform  X  by  a  sequence  S  of  transitions  as  follows  : 

1.  Initially,  S«-0,  and  k*-0. 

2.  If  there  is  a  t€[tkPtk+i)  such  that  X(t“)  ^  X(t+),  then  set  x«-X(t“),  y*-X(+),  and  append  the 

transition  at=(x,y,k)  to  S. 

3.  Set  k«-k+l  and  repeat  step  2,  until  k=K. 

If,  however,  a  ternary  signal  is  constant  throughout  the  time  interval  then  it  does  not 

undergo  any  transitions.  We  represent  such  a  signal  by  a  sequence  consisting  of  a  single  transition  of  a 
suitable  type  taking  place  before  V  Thus  a  waveform  that  is  always  0  is  represented  by  (u,Q,— l),  and 
a  constant  1  signal  by  (u,lp— 1),  where  the  integer  —1  represents  all  time  points  t  <to-  A  constant  u  sig¬ 
nal,  though  seldom  occurring  in  practice,  can  also  be  represented  either  by  (0,u,— 1)  or  (l,u»— 1).  We 
will  adopt  the  convention  that  —1  will  be  used  to  denote  transition  times  in  the  case  of  constant  signals 
only. 

Let  S,  =  aua2, _ ,  ap  and  Sb  =  0lP02> . . . ,  0q  be  two  sequences  of  transitions,  and  let  X„  and 

Xb  denote  their  corresponding  ternary  digital  waveforms  respectively.  The  waveform  X^  such  that 
Xr(t)  =  X,(t)VXb(t)  for  each  t€[to,tf]  is  called  the  "OR"  of  X^b.  Similarly  a  waveform  Xd  is  the 
"A.\D"  of  XJCb  is  Xj(t)  =  X,(t)AXb(t)  for  each  t€[to,tfl  The  sequence  Sc  of  transitions  that 
represents  X<.  is  denoted  by  S^VSh  and  Sd  that  represents  X,,  is  denoted  by  SiASb.  Also,  we  can  define 
the  "INVERSE"  of  a  sequence  S,  representing  the  waveform  X,  to  be  the  sequence  of  transition,  denoted 
by  -*Sj,  representing  a  waveform  X*,  where  X«(t)  =  -<Xa(t)  for  each  t€[to,tfl  We  therefore  have  two 
binary  operations  V  and  A  and  one  unary  operation  -•  on  sequences  of  transitions.  As  an  illustration 
consider  two  (compatible)  sequences  of  transitions: 


Sa  =  (0,0,10) ,  (u.1,70) ,  ( l,u,100) ,  (u,l,l  10) ,  ( 1,14500) ,  (u,0,600) 

Sb  =  (l.iUOO) ,  (u,0300) ,  (0,u,700) ,  (u,l,800). 

The  corresponding  waveforms  Xa  and  Xb  are  shown  in  Figure  4.1.  The  sequences  obtained  by  perform¬ 
ing  the  "OR"  and  "AND"  operations  on  these  two  sequences  are 

SaVSb  =  ( 1,»4500) ,  (u,0,600) ,  (0,u,700) ,  (u,  1,800) 

and 

Sa/\Sb  =  (0,u,10) ,  (u,l,70) ,  (l,u,100) ,  (u,  1,110)  ,  (l,u,200)  ,  (u,0,300) 

respectively,  and  their  corresponding  waveforms  X<.  and  X,,  are  also  shown  in  Figure  4.1.  The  sequence 
obtained  by  performing  the  "INVERSE"  operation  on  Sa  is 

-Sa  =  (l,u,10)  ,  ( u,0,70) ,  (0,u,100) ,  ( u,0,l  10) ,  (0,14500) ,  (14I.6OO) 

which  is  obtained  by  simply  inverting  each  ternary  value  in  every  term  of  the  sequence. 

Two  sequences  Sa  =  {otj  }i?=1  and  Sb  =  {£;}£,  with  the  same  number  of  terms,  and  where 
<*i=(x,,yi Jtj)  and  0j=(x'i,y',,k'i),  are  type-equal  if  Xj^x'j  and  y^y'j  for  each  l^i^p  and  time-equal  if 
ki=k'i  for  each  l^i^p.  The  two  sequences  are  equal  if  they  are  both  type-equal  and  time-equal.  It 
must  be  noted  that  two  sequences  can  be  compared  for  equality  if  and  only  if  they  have  the  same 
number  of  terms.  For  example,  the  two  sequences 

(0,u,100) ,  (u.1,200) ,  ( l,u,300) ,  (u,0,400) 

and 

(0,u,110) ,  (u,l,190)  ,  (l,u,250) ,  (u,0,360) 
are  type-equal  but  not  time-equal,  whereas 
(0,u,100) ,  (u,l,200) ,  (1,14300) ,  (u,0,400) 

and 

(0,u,100) ,  (u,l,200) ,  ( 1,14300) ,  (u, 1,400) 
are  time-equal  but  not  type-equal. 


We  now  introduce  notions  of  complete  and  partial  pairs  of  transitions  in  a  compatible  sequence 
S,  =  a„a2,  •  •  • ,  aF  where  a;  =  (x^y^). 

Definition  4.1  :  Two  successive  terms  a-,  and  ai+1  in  a  compatible  sequence  with  X;€{0,l}  are  said  to 
form  a  complete  pair  of  transitions  if  yi+i=”Xi  and  a  partial  pair  if  yi+i=x,.  It  must  be  noted  that 
since  the  sequence  is  compatible  and  the  transition  types  are  allowable,  the  above  choice  of  x,  forces 
yi=xi+1=u. 

For  example,  the  pair  (0,u,100) ,  (a, 1,200)  is  a  complete  pair  while  (0,u,100) ,  (u.0,200)  is  a  par¬ 
tial  pair.  A  complete  pair  of  transitions  corresponds  to  an  analog  waveform  crossing  both  the  threshold 
limits,  thereby  completing  the  transition,  whereas  a  partial  pair  represents  a  potential  glitch  or  a  hazard 
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Definition  4.2  :  A  compatible  sequence  of  transitions  is  said  to  be  a  complete  sequence  if  it  has  no  par¬ 
tial  pairs.  The  completion  of  a  compatible  sequence  is  the  maximal  compatible  subsequence  consisting 
of  only  complete  pairs  of  transitions.  For  example,  the  completion  of  the  sequence 

(0,u,100) ,  (u, 1,200) ,  (l.u.250) ,  (u.1,260) ,  ( l.tUOO) ,  (u.0,350) ,  (0,u,400) ,  (u,0,420) 
is  the  sequence 

(0,u,100) ,  (u, 1,200) ,  ( 1,000) ,  (u.0,350). 

Given  two  compatible  and  complete  sequences  S»= {(x.i,yiJti)}il=i  and  Sb=[(xi,yiJt'i)}itj  that  are 
type-equal  but  not  necessarily  time-equal,  we  define  a  measure  on  the  difference  in  transition  times 
between  the  two  sequences  to  be 

p(S^b)  =  max  |^4  (4.1) 

j<i<p  Jt; 

It  must  be  noted  that  the  above  measure  is  defined  only  for  complete  sequences  which  are  type-equal. 
Two  compatible  sequences  (not  necessarily  complete)  are  said  to  be  time-comparable  if  their  completions 
are  type-equal.  If  S,  and  Sb  are  two  time-comparable  sequences,  i-e,  their  respective  completions  S’,  and 


S‘b  are  type-equal,  we  then  define  an  extended  measure  p  to  be 

p(Sa,Sb)  =  p(  S„Sb).  (4.2) 

As  an  example,  consider  two  compatible  sequences 

Sa=(0,u,40) ,  ( 140,50) ,  ( 0,u,100) ,  (u,l,200) ,  ( 1,(4300) ,  (u,0,400) 

and 

Sb =(0,u,110) ,  ( u,l,195) ,  (1,(1,260) ,  ( (1,1,280) ,  (1,(4330) ,  (u,0,450) 
whose  completions  are 

Sa =(0,11,100) ,  ((1,1,200) ,  ( 1,14300) ,  (u,0,400) 

and 

S'b=(0,u,110)  ,  (t4l,195) ,  (1,(4330)  ,  (u,0,450) 

respectively.  Since  Sa  and  S'b  are  type-equal  we  have  Sa  and  Sb  are  time-comparable  and,  in  this  case, 
p(Sa^b)=p(Sa^b)=  12^%. 

4.2  Switch-level  Simulation  of  a  Block 

Let  Si  denote  the  sequence  of  transitions,  computed  by  switch-level  simulation,  and  let  V;  be  the 
actual  analog  waveform  at  a  node  nj  €N  in  the  network.  We  can  obtain  the  three-state  digital 
equivalent  of  Vj  using  the  transformation  in  Equation  3.1.  Let  Sj  denote  the  sequence  of  transitions 
corresponding  to  this  ternary  digital  equivalent.  We  define  the  aim  of  our  switch-level  timing  simula¬ 
tor  to  compute  Sj  that  is  time-comparable  to  Si(  such  that,  p(S,,S;)  <€  where  €  is  a  measure  of  the  accu¬ 
racy  of  the  timing  in  the  simulation.  It  must  be  noted  that  we  are  only  interested  in  guaranteeing  the 
timing  in  case  of  complete  pairs  of  transitions  and  not  for  partial  pairs.  However,  partial  pairs  will  be 
included  in  the  sequence  to  warn  the  user  of  a  possible  glitch  or  hazard  at  a  node  in  the  network.  In 
this  section  we  will  discuss  algorithms  that  will  compute  a  so-called  zero-delay  sequence  of  transitions 
at  the  output  nodes  of  a  block  in  a  simple  SCC  of  the  network.  The  complete  pairs  of  transitions  are 
then  delayed  by  a  delay  operator  to  be  discussed  in  the  next  chapter,  followed  by  a  filtering  operation 


to  produce  sequences  that  represent  realistic  waveforms  and  improve  the  accuracy  of  the  timing  in  case 
of  partial  pairs  of  transitions. 


4.2.1  Simulation  of  an  SRC 

Let  Qc  be  an  SRC  with  output  node  n,  in  a  partitioned  XMOS  network.  Since  an  SRC,  by 
definition,  does  not  have  any  inputs,  its  corresponding  vertex  cannot  be  in  any  directed  cycle  in  the 
derived  digraph.  Let  VQ  denote  the  analog  waveform  at  node  n„  during  the  time  interval  [to,tfJ  Since 
NODTYPln*,)  =  input  a  description  of  V0  would  be  available  in  the  input  description  of  the  network. 
Thus,  simulating  an  SRC  would  simply  amount  to  computing  the  sequence  of  transitions  S0  directly 
from  the  analog  waveform  V„  as  described  below. 

Algorithm  4.1 


Input  :  An  SRC  G0  with  output  node  n^ 

an  analog  waveform  V0(t)  for  tett^tf] 
and  two  threshold  voltages  VL  and  VH. 

Output :  A  sequence  of  transitions  Sc  representing  the  ternary 
equivalent  of  V0. 

procedure  SRC_SIM  ( flc) 
begin 

So«-0: 

k-0; 

ind  «-  “constant"; 
repeat 

ta-t* 

va4_V0(t1); 
vb«-V0(tb>, 
vt— v,-VL; 

▼2-vb-vL; 

v3»-v1-VH; 

V4<-Vb-VH; 

if  (v,  <0  &  v2>0)  then 
append  (0,ujt)  to 
ind  *-  “variation"; 
else  if  (v3^0  &  v4>0)  then 
append  (u»ljt)  to  S,; 


ind«-  variation  ; 
else  if  ( v3  ^  0  &  v4  <  0)  then 
append  ( l,Utk)  to 
ind «-  "variation"; 
else  if  (v,  >0  &  v,  <0)  then 
append  (u,0,k )  to 
ind  "variation"; 

end  if 
k*~k+l: 

tb^t^lijjjjp, 

until  tb>tf; 

if  ind = "constant"  then 
if  (vj  <0)  then 

append  (u,0,— 1)  to  S<; 
else  if  (v,  ^0  &  v3^0)  then 
append  (0,u,— 1)  to 
else  if  (v3>0)  then 

append  (u,l,— 1)  to  S^* 

end  if 

end  if 

end 

In  the  above  algorithm,  the  indicator  ind  is  used  to  decide  whether  the  analog  waveform  crossed 
any  of  the  threshold  limits.  In  case  it  does  not,  then  the  sequence  is  set  to  the  appropriate  transition 
occurring  at  integer  time  k=— 1,  i.e„  at  real  time  tCt®.  We  now  state  the  following  theorem,  the  proof 
of  which  is  fairly  obvious,  but  it  is  an  important  result  to  be  used  in  the  later  sections. 


Theorem  4.1  :  The  sequence  Sc  computed  by  Algorithm  4.1  is  a  compatible  sequence  and  represents  the 
ternary  equivalent  of  the  analog  waveform  VQ. 


4.2.2  Simulation  of  an  MFB 

Let  ftf  be  an  MFB  that  is  to  be  simulated  with  Uo  as  its  (unique)  output  node  and  INP(Qf)  as  its 
input  nodes.  For  each  input  node  n;  let  Sj  denote  the  sequence  of  transitions  at  that  node,  and  let  Z;€L 
denote  the  ternary  value  of  the  node  signal  at  some  time  instant.  Also,  let  S0  be  the  sequence  of  transi¬ 
tions  to  be  computed  and  z<,  denote  an  instantaneous  value  of  the  signal  at  node  Let  INTERN 
denote  the  set  of  interna!  nodes  within  the  MFB.  .As  mentioned  earlier,  an  MFB  can  be  viewed  as  a  net¬ 


work  of  switches  between  the  drain  and  source  nodes  of  its  driver  transistors  whose  conduction  states 


are  controlled  by  the  ternary  signals  at  the  gate  terminals.  Since  an  MFB  has  no  external  nodes,  by 
definition,  the  sequences  at  its  internal  nodes  need  not  be  computed.  The  fundamental  idea  in  conven¬ 
tional  switch-level  simulation  is  that  the  signal  at  a  node  can  only  be  changed  by  a  signal  at  a  stronger 
node  and  can  change  the  signals  only  at  weaker  nodes.  In  a  proper  MFB  the  only  node  stronger  than 
the  output  node  (which  is  a  pullup)  is  the  ground  node  whose  signal  is  always  at  0.  Hence,  to  compute 
Zv  one  only  has  to  compute  the  state  of  conduction  of  the  switches  connecting  the  output  node  to  the 
ground  node.  Thus;  we  can  think  of  z,,  to  be  a  special  kind  of  a  ternary  function  of  the  input  signals 
{ z,  :  n,€INP(Of)|.  If  the  MFB  is  not  proper,  its  output  node  signal  is  always  at  1  irrespective  of  its 
input  signals  since  no  internal  node  can  influence  the  value  of  this  signal.  This  specialized  structure  of 
an  MFB  enables  us  to  use  a  much  simpler  and  more  efficient  algorithm  for  its  simulation  rather  than 
using  the  more  complex  conventional  switch-level  algorithms  such  as  the  ones  used  in  MOSSIM  [l 9]  or 
EXPRESS-II  [25]. 

Before  we  describe  the  actual  algorithms  to  simulate  an  MFB  we  digress  briefly  to  study  the  pro¬ 
perties  of  some  ternary  functions.  Let  p  be  a  positive  integer  and  let  Lp  denote  the  pth  Cartesian  power 
of  L,  i.e„  Lp  is  the  set  of  all  ternary  vectors  (z,,z2, . . . ,  zp)  where  z,€L  for  each  i=l,2, . . .  ,p.  A  p- 
variable  ternary  function  f(zi,z2, . , . ,  zp)  is  a  mapping  f:Lp— *L. 

Definition  4.3  :  A  p-variable  B-ternary  function  is  a  p-variable  ternary  function  which  is  either  con¬ 
stantly  0  or  1,  or  obtained  from  its  arguments  Zj,z2,  — ,  zp  by  successive  application  of  the  algebraic 
operations  of  \/,  A,  or  An  example  of  a  five-variable  B-ternary  function  is 

f(z„Z2,Z3,Z4,Zs)  =  -'((z1Az3)V(z2Az4)\/(z1AZsAz4)\/(z2AzsAz3)).  (4.3) 

Associated  with  each  variable  zt  are  two  literals,  namely,  Z\  and  -»Zj.  Thus  a  p-variable  B-ternary 
function  can  have  at  most  2p  literals.  We  will  use  the  symbol  Wj  to  denote  a  literal.  The  literal  is  said 
to  be  in  its  normal  form  if  Wj=Zj  and  in  its  inverted  form  if  Wj=-'Zi.  A  product  term  is  a  B-ternary 
function  that  is  obtained  bv  successively  performing  the  A  operation  on  its  literals.  For  example,  if 


....  wir  are  r  literals,  then  the  corresponding  product  term  is  wu  *  •  •  A"wir  Since  the 

A  operation  on  L  is  both  associative  and  commutative,  the  order  of  the  literals  does  not  matter  and 
hence  the  product  term  is  well-defined.  Thus  any  p-variable  B-temary  function  f  consisting  of  q 
literals  Wj,\v2, . . . ,  wq  where  q^2p  can  be  expressed  as 

f  *  gi  Vg2V  *  *  *  Vg*  (4.4) 

where  gj  is  a  product  term  of  a  subset  of  the  q  literals,  for  each  j=l,2, . . . ,  v.  This  result  follows 

directly  from  the  corresponding  well-known  result  that  any  switching  function  can  be  expressed  in  a 

sum  of  products  form  and  can  be  found  in  any  standard  text  book  on  switching  theory,  such  as  [54l 

since  the  relevant  laws  of  conventional  two-valued  Boolean  algebra  used  in  its  proof  are  easily 

extended  to  the  ternary  case.  We  will  use  the  term  sum  as  analogous  to  the  V  operation  and  product  as 

analogous  to  the  A  operation.  Thus  the  result  in  the  above  Equation  (4.4)  can  be  simply  stated  as  any 

B-ternary  function  can  be  expressed  as  a  sum  of  products  of  its  literals.  Similarly  it  can  also  be  shown 

that  any  B-ternary  function  f  can  also  be  expressed  as  a  product  of  sums  of  its  literals  [54],  i^, 

f  =  hj  Ah2A  *  *  •  AhM  (4.5) 

where  each  hj  is  a  sum  of  a  subset  of  literals. 

We  now  introduce  the  notion  of  zero-delay  through  a  block  in  a  network.  By  this  we  mean  that 
there  are  no  delay  elements  present  in  the  block  and  that  at  any  instant  of  time  the  ternary  value  of 
the  output  signal  can  be  determined  from  those  at  its  input  signals  at  the  same  instant  of  time.  Wre  say 
that  an  MFB  with  p-inputs  z„  . . . ,  zp  realizes  a  p-variable  ternary  f unction  f  if  the  ternary  output 
signal  Zo  can  be  expressed  as  Zq  =  ftzltz2, . . . ,  Zp)  while  the  MFB  is  assumed  to  operate  in  the  zero- 
delay  mode. 

The  zero-delay  value  of  z^  of  an  MFB  of  p  driver  transistors  m1^n2>  •  •  • » mp  can  be  computed  as 
follows.  Let  z,  be  the  value  of  the  signal  at  the  gate  node  of  m;.  Let  Hf  be  the  D-block  in  the  graph 
representing  the  network  corresponding  to  the  MFB.  Each  edge  of  Hf  has  a  conduction  state  associated 
with  it  which  is  equal  to  the  ternary  value  of  the  gate  signal  of  the  corresponding  driver  transistor. 


The  state  of  a  path  P  in  the  graph  is  defined  as  the  product  term  of  the  states  of  the  edges  in  the  path. 
If  there  is  a  path  between  the  output  vertex  and  the  ground  vertex  with  state  1,  then,  clearly  the  signal 
at  the  output  node  will  be  forced  to  have  the  value  of  the  ground  signal  (which  is  stronger)  which  is  a 
0,  i.e,  z<,  =  0  in  this  case.  If  all  paths  between  the  output  and  the  ground  vertices  have  state  0  then 
z,,  =  1.  If  there  are  no  paths  with  state  1  and  at  least  one  path  with  state  u  then,  in  this  case,  z,,  =  u. 
Let  P,»P2, . . . ,  Ps  denote  all  the  paths  between  the  output  vertex  and  ground  vertex  in  the  MFB  and 
let  gj  denote  the  state  of  path  P;  for  each  i=l,2, . . .  ,s.  Clearly,  each  gj  is  a  product  term  of  the  ter¬ 
nary  signals  at  the  gate  nodes  of  the  transistors  corresponding  to  the  edges  in  path  P;.  From  the  above 
simple  arguments  it  is  clear  that  the  ternary  value  of  the  output  signal  can  be  obtained  by  summing  all 
the  g,’s  and  inverting  the  resulting  sum,  i^, 

z o  =  -^giVg2V  *  •  *  Vg*).  (4.6) 

Thus  z„  is  a  p-variable  B-ternary  function  of  its  arguments  z,,z2, . . . ,  Zp,  which  are  the  signals  at  the 
input  nodes  to  the  MFB.  It  must  be  noted  that  in  each  product  term  g,  above,  no  literal  appears  in  its 
inverted  form,  i.e„  all  literals  appear  in  their  normal  form.  Such  a  product  term  will  be  referred  to  as 
a  normal  product  term.  We  now  present  some  interesting  results  in  the  synthesis  of  networks  composed 
of  MFB’s  to  realize  any  combinatorial  switching  function. 

Theorem  4.2  :  Any  p-variable  B-ternary  f unction  ftzlPz2, . . . ,  Zp)  that  can  be  expressed  as  the  inver¬ 
sion  of  a  sum  of  normal  product  terms,  as  in  Equation  (4.6),  can  be  realized  by  a  single  MFB  with  p 
input  nodes. 

Proof  :  We  begin  constructing  an  MFB  with  a  supply  node  (connected  to  a  power  supply  VDD),  a 
ground  node,  and  p  input  nodes  nlPn2, . . . ,  np  such  that  the  ternary  signal  at  n;  is  for  each 
i=l,2,...,p.  We  then  include  a  depletion  transistor  with  drain  node  connected  to  the  supply  and 
source  and  gate  nodes  tied  together  at  a  node  n0  which  we  will  call  the  output  node  of  the  MFB.  We 
now  introduce  the  notion  of  a  series  chain  of  transistors  which  will  be  made  use  of  in  the  construction 
of  the  driver  block  of  the  MFB. 


A  set  of  8  transistors  is  said  to  form  a  8  series  chain  if  the  subgraph  induced  by  the  edges 
corresponding  to  these  transistors  in  the  graph  representing  the  network  is  a  path  of  length  8.  The 
nodes  corresponding  to  the  end  vertices  of  the  path  will  be  called  the  end  nodes  of  the  chain.  An 
example  of  a  4  series  chain  is  shown  in  Figure  4.2. 

Let  the  B-ternary  function  be  expressed  in  the  required  form  as 

f  =  -^gi Vg2V  ‘  *  *  Vg,) 

where  gj  is  a  product  term  of  8j  normal  literals  for  each  j=l,2,  Corresponding  to  each  gj  we 

insert  a  8j  series  chain  of  enhancement  transistors  with  one  end  node  as  n„  and  the  other  as  the  ground 
node.  Each  transistor  in  a  series  chain  is  associated  with  a  normal  literal  appearing  in  the  corresponding 
product  term.  The  gate  node  of  a  transistor  corresponding  to  a  literal  z,  is  connected  to  the  input  node 
nj.  We  thus  have  s  series  chains  of  enhancement  transistors  connected  in  parallel  across  the  output  node 
n0  and  the  ground  node.  It  can  easily  be  verified  that  such  a  configuration  would  correspond  to  a  D- 
block  in  a  graph  representing  the  network  and  hence  the  subnetwork  we  have  constructed  constitutes 
an  MFB.  Furthermore  this  MFB  would  realize  the  required  B-ternary  function.  □ 

The  simplest  proper  MFB  is  an  inverter  consisting  of  exactly  one  driver  transistor  as  shown  in 
Figure  4.3(a).  Even  simpler  than  this  is  an  MFB  with  no  driver  transistors,  in  which  case,  the  ternary 
signal  at  the  output  is  always  at  1  (which  incidentally  is  a  B-ternary  function  by  definition).  Figures 
4.3(b)  and  (c)  show  two-input  NAN'D  and  NOR  gates  respectively.  As  an  illustration  of  the  technique 
used  in  the  proof  of  the  above  theorem  we  consider  the  five-variable  B-ternary  function  given  in  Equa¬ 
tion  (4.3).  An  MFB  with  ten  driver  transistors  realizing  this  function  is  shown  in  Figure  4.4.  This  con¬ 
sists  of  four  series-chains  connected  in  parallel  across  the  output  of  the  MFB  and  the  ground  node  con¬ 
sisting  of  two,  two,  three,  and  three  driver  transistors  respectively.  One  measure  of  the  complexity  of 
an  MFB  could  be  chosen  as  the  number  of  driver  transistors  in  the  MFB.  It  must  be  noted  that 
Theorem  4.2  does  not  say  anything  about  the  uniqueness  of  the  MFB  realization.  In  fact  there  could  be 
several  MTS’s  realizing  the  same  B-ternary  function.  Figure  4.5  shows  another  MFB  realizing  the  same 


B-ternary  function  as  in  Equation  (4.3).  This  MFB,  in  fact,  has  only  five  driver  transistors  and  is  an 
example  of  using  bridged  configurations  to  reduce  the  number  of  transistors  in  a  series-parallel  realiza¬ 
tion. 

The  B-ternary  functions  considered  by  Theorem  4.2  are  of  a  rather  restricted  nature.  If  we  relax 
the  requirement  that  only  a  single  MFB  be  used  in  the  realization,  we  can  consider  a  subnetwork  of 
MFB’s  realizing  any  general  B-ternary  function.  The  number  of  levels  in  a  subnetwork  composed  of 
blocks  can  be  defined  as  the  length  of  the  longest  directed  path  in  the  corresponding  subdigraph  within 
the  digraph  derived  from  the  partitioned  network.  The  following  result  shows  that  any  B-ternary 
f unction  can  be  realized  by  a  two-level  subnetwork  of  MFB’s. 

Theorem  4.3  :  Let  f(zltz2, ....  zp)  be  any  p- variable  B-ternary  function.  Then  f  can  be  realized  by  a 
at  most  two-level  subnetwork  consisting  of  0+1  MFB’s  with  0  of  these  MFB’s  being  simple  inverters, 
where  0^p. 

Proof  :  Let  f  =  h,  Ah2A  -  •  •  Ah,  be  the  product  of  sums  expression  of  the  B-ternary  function  f. 
Since  -4-4f))  =  f  we  can  rewrite  the  function  as 

f  =  -><g|Vg2\/  *  *  *  Vg*) 

where  gj  =  -4hj)  can  be  easily  shown  to  be  a  product  term  for  each  j=l,2, . . .  ,s,  through  simple  ter¬ 
nary  algebraic  manipulations.  The  rest  of  the  proof  is  very  similar  to  that  of  Theorem  4.2  in  that  an 
MFB  is  constructed  with  a  series  chain  for  each  product  term  and  the  number  of  transistors  in  a  series 
chain  equal  to  the  number  of  literals  (both  normal  and  inverted)  in  the  corresponding  product  term.  If 
all  literals  appearing  in  the  product  terms  are  in  their  normal  form,  then  from  Theorem  4.2,  a  one-level 
realization  can  be  obtained.  If  a  literal  ->Zj  appears  in  a  product  term  gj  in  its  inverted  form,  the  gate 
node  of  the  corresponding  transistor  in  the  series  chain  is  connected  to  the  output  of  an  inverter  whose 
input  is  connected  to  node  ns.  Clearly,  the  number  of  inverters  needed  is  equal  to  the  number  of 
literals  appearing  in  their  inverted  form  in  a  product  term  which  is  at  most  p.  It  can  be  easily  verified 
that  the  output  of  the  MFB  apart  from  the  inverters  in  the  subnetwork  is  the  required  B-ternary 


function  of  the  signals  at  the  input  nodes  of  the  subnetwork.  Furthermore,  this  is  a  two-level  subnet¬ 
work.  □ 

As  an  illustration  consider  the  following  three- variable  B-ternary  function 

ft  =  (ziAz2)\/((->z,\/z2)Az3) 
which  can  be  expressed  in  its  products  of  sum  form  as 

f(  =  (z,\/-’Zi\/z2)A(z1V-1Z,\/Z3)A(->Zi\/Z2)A(z2\/Z3)A(Zj\/Z3)A(-’Zi\/Z2\/Z3)A(Zi\/Z2\/Z3). 

Using  simple  algebraic  manipulations  this  reduces  to 

Az,  A-*z2)\/(  --z,  Az,  A^z3)V(z!  A-z2)\/(  ->z2A-^3)\/(  ->Zi  A->z3)\/(z,  A->z2  A-*z3)\/(  ~»z,  A->z2  A-*z3», 

which  is  in  the  required  form  as  an  inverse  of  a  sum  of  product  terms.  We  then  have  a  series  chain  for 
each  product  term  above.  In  the  first  series  chain,  the  gate  of  the  first  transistor  is  connected  to  the  out¬ 
put  of  an  inverter  whose  input  is  connected  to  node  nlt  the  gate  of  the  second  transistor  is  connected 
directly  to  node  n,,  while  the  gate  of  the  third  is  connected  to  the  output  of  another  inverter  with 
input  node  n2.  This  is  repeated  for  each  of  the  remaining  series  chains.  The  complete  realization 
involving  an  \1FB  with  18  driver  transistors  and  three  other  inverters  is  shown  in  Figure  4.6(a).  A 
much  simpler  realization  with  an  MFB  containing  only  six  driver  transistors  and  three  inverters  is 
shown  in  Figure  4.6(b).  Thus,  Theorem  4.3  only  guarantees  the  existence  of  an  MFB  that  realizes  a  B- 
ternarv  function,  and  its  proof  describes  a  technique  to  construct  one  such  realization  using  series  chains 
of  transistors  connected  in  parallel  across  the  output  node  and  ground.  However,  it  may  be  possible  to 
construct  another  MFB  to  realize  the  same  B-ternary  function  using  a  different  design  philosophy  and 
may  turn  out  to  be  even  simpler  than  the  first  realization.  Therefore,  MFB’s  play  a  very  important  role 
in  NMOS  designs  since  any  combinatorial  switching  function,  which  is  a  restriction  of  a  B-ternary 
function  to  the  two-valued  Boolean  algebra,  can  be  realized  by  a  at  most  two-level  subnetwork  com¬ 
posed  of  only  MFB’s  according  to  Theorem  4.3  .  In  practical  designs,  however,  the  designer  may  want 
to  realize  several  combinatorial  switching  functions  in  the  same  subnetwork  which  might  require  more 


levels.  Furthermore,  the  use  of  pass  transistors  in  realizing  combinatorial  logic  [56]  sometimes  yields 
-\MOS  designs  with  better  performance. 

We  will  now  describe  the  algorithm  to  simulate  an  MFB  with  no  internal  feedback.  The  algo¬ 
rithm  begins  by  first  assumming  that  the  MFB  is  in  a  zero-delay  mode  and  computes  a  sequence  of  tran¬ 
sitions  called  the  zero-delay  sequence  at  its  output  node.  Each  transition  in  the  zero-delay  sequence  is 
then  delayed  by  a  delay  operator  followed  by  a  filtering  process  that  produces  a  chronological  and  com¬ 
patible  delayed  sequence.  In  this  section  we  will  focus  our  attention  only  on  obtaining  the  zero-delay 
sequence  at  the  output  node  of  the  MFB  given  the  sequences  of  transitions  at  its  input  nodes.  The  delay 
and  filtering  operations  will  be  discussed  in  Chapter  5. 

Consider  an  MFB  with  a  set  of  driver  transistors  Mf={mi^n2»  •  •  •  »mPl»  a  set  of  input  nodes 
INPf={n„n2,  • . . ,  np)  and  an  output  node  where  n—GATElmj)  for  each  i=l,2, . . . ,  p.  Let  Sj  be 
the  sequence  of  transitions  at  node  n;  with  transition  times  between  integers  K,  and  K;.  In  the  case  of 
MFB’s  in  simple  SCCs  we  can  assume  K,=0  and  K2=K,  i.e,  the  input  sequences  are  known  for  the 
entire  time  interval.  In  other  situations  the  values  of  Kj  and  K2  would  be  decided  by  an  algorithm  to 
process  blocks  in  a  general  SCC  to  be  discussed  in  Chapter  6.  Let  Hf  be  the  D-block  corresponding  to 
the  MFB  in  the  graph  representing  the  network.  Each  edge  corresponding  to  transistor  m;  is  associ¬ 
ated  with  an  edge  sequence  S^es)  which  is  initially  set  to  S;.  Two  edges  in  a  graph  are  said  to  be  paral¬ 
lel  if  they  have  the  same  end  vertices.  A  simple  graph  is  a  graph  with  no  self  loops  and  no  parallel 
edges.  The  simplification  of  a  graph  is  a  graph  obtained  by  collapsing  all  parallel  edges  into  a  single 
edge  whose  edge  sequence  is  the  sum  (V)  of  the  sequences  of  the  parallel  edges.  We  define  the  elimina¬ 
tion  of  a  vertex  v  from  a  simple  graph  as  a  procedure  involving  the  following  two  steps  : 

( 1 )  For  every  pair  of  vertices  a  and  b  adjacent  to  v  in  the  graph,  add  an  edge  between  a  and  b  with 
the  edge  sequence  of  this  new  edge  being  the  product  (A)  of  the  sequences  corresponding  to  the 
edges  <v,a>  and  <v,b>,  respectively. 
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(2)  Delete  the  vertex  v  (and  all  edges  incident  on  it)  from  the  new  graph  obtained  in  step  (1). 

It  must  be  noted  that  eliminating  a  vertex  from  a  simple  graph  could  create  parallel  edges  in  the 
new  graph.  If  we  treat  the  graph  Hf  as  a  two-terminal  network  of  switches  between  the  output  vertex 
and  the  ground  vertex,  we  can  define  a  transmission  function  T  that  denotes  the  state  of  "conduction" 
between  the  output  and  ground  vertices  as  follows : 

a)  Each  edge  of  the  graph  represents  a  switch  whose  state  at  any  instant  of  time  could  be  open, 
intermediate,  or  closed,  denoted  by  symbols  0,  u,  or  1,  respectively.  Thus  the  edge  sequence 
represents  the  variation  of  the  state  of  the  switch  with  time  and  can  be  defined  to  be  the  transmis¬ 
sion  function  through  the  edge. 

b)  The  transmission  function  through  a  path  is  defined  to  be  the  product  (A)  of  the  transmission 
f unctions  through  the  edges  in  the  path. 

c)  The  transmission  function  T  between  the  output  vertex  and  ground  is  the  sum  (V)  over  all  possi¬ 
ble  paths  between  the  two  vertices  of  the  transmission  function  through  each  path. 

Clearly,  T  is  a  sequence  of  transitions  and  S0=-«T. 

Theorem  4.4  :  The  operations  of  simplification  of  a  graph  and  internal  vertex  elimination  in  a  simple 
graph  do  not  alter  the  transmission  function  between  the  output  vertex  and  the  ground  vertex  in  the 
graph. 

Proof  :  Let  us  consider  a  set  of  parallel  edges  E={ei,e2, . . .  ,eq}  between  vertices  a  and  b  in  a  graph 
H.  Let  us  partition  the  set  of  all  paths  II  between  the  output  vertex  and  the  ground  vertex  in  H  into 
two  sets,  namely,  II  and  IT,  where  II  is  the  set  of  all  paths  containing  an  edge  ej€E  and  13'  is  the  set  not 
containing  any  e;€E.  Let  Hj  be  the  graph  obtained  from  H  by  replacing  the  set  E  by  a  single  edge  e’ 
between  a  and  b,  with  S(e’)=S(e1)\/S(e2)\/  —  \/S(eq).  If  nt  denotes  all  paths  between  the  output 
vertex  and  the  ground  vertex  in  H,  containing  e\  then,  clearly  the  set  of  all  paths  IIi  between  output 
and  ground  vertices  in  H1  is  II|=II,  U  O'.  Let  T(P)  denote  the  transmission  function  through  a  path  P 
and  let  T(II)  denote  the  sum  of  transmission  functions  through  each  path  in  the  set  n.  The 


transmission  function  between  output  vertex  and  ground  vertex  in  H  is  clearly 


T=T(n)VT(ir) 

while  that  m  Hi  is  T(nj)\/T(II')-  Let  P  be  some  path  in  Dj  and  F=P— e*.  Clearly,  F  is  either  a  path  or 
a  union  of  two  disjoint  paths.  In  either  case  let  T(F)  denote  the  product  of  the  transmission  functions 
through  the  edges  in  F.  It  is  also  easy  to  see  that  Pj=F+ej  is  a  path  in  H  for  each  i=lA  •  •  • ,  q  and 

T(P)=T(F)AS(e,)=T(P,)VT(P2)V  •  •  •  VT(Pq). 

Therefore,  T(n)=T(n,),  and  so  the  transmission  function  between  the  output  vertex  and  ground  vertex 
in  H  is  the  same  as  that  in  H,.  We  can  repeat  the  same  argument  for  a  set  of  parallel  edges  in  H,  and  so 
on  until  we  end  up  with  a  simple  graph.  Hence,  the  transmission  function  between  two  vertices  in  a 
graph  does  not  change  on  simplification  of  the  graph. 

Now  let  us  consider  a  simple  graph  H  and  an  internal  vertex  v  in  the  graph.  Let  IIV  be  the  set  of 
paths  from  the  output  vertex  to  the  ground  vertex  containing  the  vertex  v  and  let  IT  be  the  ones 
without  v.  If  n  denotes  the  set  of  all  paths  between  the  output  and  ground  vertices  in  H,  then  clearly 
the  transmission  function  T=T(II)=T(nv)VT(ir).  Suppose  the  degree  of  v  in  H  is  q  and  let 

AdjH(v)={w,,w2, _ ,  wqJ.  Since  H  is  simple,  all  vertices  adjacent  to  v  must  be  distinct.  Let  e,  denote 

the  edge  joining  v  and  w,  in  H.  Let  Hj  denote  the  graph  obtained  from  H  by  eliminating  v.  Let  the 
new  edge  that  joins  Wj  and  Wj  in  Hi  be  denoted  by  e^  By  definition  S(eij)=S(ei)AS(ej).  Let 
Eq={e,j :  i,j=l,2, . . . ,  q  4^jl-  Let  II]  denote  the  set  of  all  paths  between  the  output  vertex  and 
ground  in  H,.  If  H  denotes  the  set  of  paths  between  the  output  vertex  and  ground  vertex  in  Hi  that 
contains  edges  from  Eq,  then  clearly  IIi=II  (J  IT.  We  can  divide  the  set  D  into  two  disjoint  subsets.  Hi 
containing  only  one  edge  from  Eq  and  H2  containing  more  than  one  edge  from  Eq.  It  can  be  easily 
verified  that  given  any  path  P2€n2  there  exists  a  path  PiCOi  such  that  the  terms  in  T(P2)  are  sub¬ 
sumed  by  the  terms  of  T(Pi),  Le„  T(Pi)\/T(P2)=T(Pi).  Therefore,  T(n)=T(ni).  Given  a  path  P€IIV 
such  that  W;  and  Wj  are  the  vertices  adjacent  to  v  on  this  path,  we  can  construct  a  path  P,  such  that 
P,=P— v+ejj.  Clearly,  P,  €  Hi  and  T(P)=T(PI).  Thus  there  is  a  1-1  correspondence  between  paths  in 
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Dv  and  fij  and  T(IIv)=T(n,).  Therefore, 

Ttn,  )=T(n)\/T(n')=T(n1  )\/T(n,)=T(nv)\/T(n')=T(n)=T, 
and  hence  the  theorem  is  proved.  C 

The  algorithm  to  obtain  the  zero-delay  sequence  of  transitions  at  the  output  node  of  a  MFB  begins 
with  the  simplification  of  the  D-block  corresponding  to  the  MFB.  It  then  picks  an  internal  vertex  in 
this  simple  graph  and  eliminates  it  and  then  simplifies  the  resultant  graph.  This  process  of  elimination 
followed  by  simplification  is  repeated  for  each  internal  vertex.  The  end  result  would  be  a  simple  graph 
on  two  vertices,  namely,  the  output  vertex  and  the  ground  vertex.  If  the  MFB  is  proper,  then  its  13- 
block  is  a  connected  graph  containing  the  ground  node,  and  so  the  graph  resulting  from  the  elimination 
of  all  internal  vertices  followed  by  successive  simplification  would  have  an  edge  between  the  output 
and  ground  vertices.  From  Theorem  4.4,  the  transmission  function  between  the  output  and  ground  ver¬ 
tices  is  the  sequence  associated  with  this  single  edge,  and  S0  would  be  the  inverse  of  this  sequence.  Once 
the  zero-delay  sequence  is  obtained  the  transition  times  are  delayed  by  a  delay  operator  and  the  whole 
sequence  is  filtered  using  techniques  to  be  discussed  in  Chapter  5. 

Algorithm  4.2 


Input  :  An  MFB  flf,  a  set  Mf  of  driver  transistors, 

a  sequence  of  transitions  S,  at  the  gate  node  of  each 

I  mjCMf.  the  D-block  of  the  MFB 

with  all  vertices  in  Vf  apart  from  the  output  vertex  and  the 
ground  vertex  marked  as  "internal". 

K,  and  K2  are  the  end  points  of  an  interval  during 
which  simulation  is  to  be  performed. 

Output :  A  sequence  SD  of  transitions  at  the  output  node  n^,  of  the  MFB. 

I 

procedure  MFB_SIM  ( JC2) 
begin 

S0 -0; 

for  each  edge  e^Ef  do 
S(ei)«- WINDOW 

I  Hfl -SIMPLIFY  (Hf); 

j*-0: 

while  there  exists  a  vertex  v  in  Hj  marked  "internal"  do 

begin 


I 


H  -ELIMINATE  (r^Hj); 
Hj+|  — SIMPLIFY(H'); 
j-j+1: 


if  there  exists  an  edge  in  Hj  then 
ec-  edge  in 
S0« — *S(e0); 

DELAY_FILTER  (S^O,); 

else 

append  (u,l,— 1)  to 

end  if 


procedure  ELIMINATE  (vJH) 
begin  . 

H-H 

for  each  pair  of  vertices  w,,Wj€  Adj^v)  do 
begin 

ej-<v,uri>; 

ej—<v,wj>; 

add  a  new  edge  e^  in  H  joining  and 
Sleijl-SlejlASlej): 

end 

a* 

return  H— vj 

end 

In  the  above  algorithm,  the  choice  of  v  as  an  internal  vertex  picked  for  elimination  from  Hj  is 
important  from  the  complexity  point  of  view.  If  the  degree  of  v  in  Hj  is  q,  then  the  total  number  of 
edges  added  as  a  result  of  eliminating  v  from  Hj  is  q(q— 1)/2— q,  which  is  equal  to  q(q— 3)/2.  Note 
that  q^2  if  v  is  to  be  on  a  path  in  Hj.  If  q=2  then  the  new  graph  has  one  edge  less  than  the  number 
in  Hj  while  the  number  of  edges  is  unchanged  if  q=3.  Hence  a  vertex  of  lowest  degree  in  Hj  is  picked 
as  the  best  candidate  for  elimination.  The  procedure  WINDOW  returns  a  sequence  of  those  transitions 
occurring  between  K|  and  K2  in  its  input  sequence. 

At  this  stage,  we  would  like  to  point  out  that  the  procedures  used  in  Algorithm  4.2  can  be  used  to 
compute  the  transmission  function  between  any  two  nodes  in  a  two-terminal  switching  network  pro¬ 
vided  the  states  at  the  internal  nodes  in  such  a  network  are  not  required  for  simulating  other  blocks  in 
the  network.  In  the  case  of  an  MFB,  by  definition,  such  a  switching  network  exists,  naturally,  between 
the  pullup  node  of  the  MFB  and  the  ground  node.  Now  let  us  consider  a  PTB  which  is  viewed  as  a  net- 


work  of  switches  between  the  drain  and  source  nodes  of  its  pass  transistors.  A  general  PTB  would 
clearly  result  in  a  multiport  switching  network.  Once  again,  in  general,  one  would  be  required  to  com¬ 
pute  the  states  at  several  nodes  within  such  a  network  since  these  could  be  external  nodes  according  to 
our  definitions  in  Chapter  3.  Furthermore,  the  delay  characteristics  of  PTB’s  are  different  from  those  of 
MFB’s  as  will  be  seen  in  Chapter  5.  Hence  we  choose  to  differentiate  between  MFB’s  and  PTB’s  and  we 
simulate  them  using  different  techniques. 

4.2.3  Simulation  of  a  PTB 

Let  fit  be  a  PTB  with  a  set  of  pass  transistors  Mr.  Let 
NDSt={DRAIN(mi),SOURCE(mi) :  m;€Mt}  be  the  set  of  drain  and  source  nodes  of  the  pass  transistors 
in  the  PTB  and  let  NGt=|GA  TElmj) :  mj€Mt}  be  the  set  of  gate  nodes.  Consider  the  set  0  of  transi¬ 
tion  times  of  the  signals  at  the  gate  nodes  arranged  in  an  ascending  order.  These  time  points  divide  the 
time  interval  of  simulation  into  several  phases  such  that  during  each  phase  ^j=(kj4tj+1)  the  signal  at 
each  gate  node  in  NGt  is  at  a  fixed  ternary  value,  Le,  a  0,  u,  <v  1.  The  time  kj  is  the  initial  time  and  the 
time  kj+,  is  the  final  time  of  phase  if>r  Let  sitj  denote  the  fixed  ternary  state  of  the  signal  at  gate  node 
n,  €NGt  during  phase  <f>r  We  partition  the  set  NDS,  of  drain  and  source  nodes  of  pass  transistors  in  the 
PTB  into  three  subsets: 

1.  N;={nx€NDS, :  NODTYP(nx)="input"},  the  set  of  nodes  of  input  strength, 

2.  Np={nx€NDSt :  NODTYP(nx)="pullup"},  the  set  of  nodes  of  pullup  strength,  and 

3.  Nn={nx€NDSt :  NODTYP(nx)= ’normal"},  the  set  of  nodes  of  normal  strength. 

We  are  given  the  sequences  of  transitions  at  each  node  in  N;  and  Np  in  the  PTB.  Our  task  is  to 
compute  the  sequences  of  transitions  at  the  nodes  in  Nn.  We  do  this  in  phases.  Initially  all  the  node 
sequences  for  N„  are  set  to  the  null  sequence.  We  then  simulate  the  PTB  in  the  first  phase  followed 
by  the  next  phase  and  so  on,  updating  the  node  sequences  for  the  normal  nodes  in  each  phase.  The 


simulation  of  a  phase  begins  by  constructing  an  undirected  graph  H,  with  vertex  set  V,=NDS, 
corresponding  to  the  drain  and  source  nodes  of  the  pass  transistors  and  the  edge  set  E,  initially  empty. 
For  each  pass  transistor  mj  €  an  edge  is  inserted  between  DRAINfm;)  and  SOURCECmj)  if  8^=1, 
i.e^  if  the  signal  at  the  gate  node  of  the  transistor  is  at  a  1  during  <f»r  Each  connected  component  of  the 
graph  represents  a  switching  network  with  nodes  connected  by  two  terminal  switches  that  are  in  the 
closed  state.  Consider  a  component  Cr  of  the  graph.  Let  STGr  denote  the  subset  of  the  strongest  nodes 
(vertices)  in  Cn  where  the  node  strengths  are  ordered  as  input  >  pullup  >  normaL  The  strength  of 
the  component  Cr  is  then  defined  to  be  the  strength  of  its  strongest  node(s).  If  |STGr|  >  1  and  the 
strength  of  Cr  is  either  input  or  pullup,  then  a  con flict  is  declared  at  each  normal  node  in  the  com¬ 
ponent.  In  case  a  node  is  experiencing  a  conflict  in  the  present  phase  there  could  be  two  possibilities, 
namely,  the  node  was  in  a  conflict  in  the  previous  phase  °r  it  was  not.  In  the  former  case  the 
duration  of  the  present  phase  is  added  to  the  existing  value  of  the  duration  of  the  conflict.  In  the  latter 
case  the  conflict  is  said  to  have  started  in  the  present  phase  and  its  duration  is  set  to  the  duration  of  the 
phase. 

If  the  strength  of  Cr  is  normaL  then  charge  sharing  is  said  to  take  place  among  the  normal  nodes 
in  the  component.  Given  any  sequence  of  transitions,  one  can  define  the  initial  value  of  the  signal  to  be 
the  ternary  value  before  the  occurrence  of  the  first  transition  and  the  final  value  to  be  the  one  after  the 
last  transition.  For  each  node  n,,€Cr  let  S(n„)  denote  the  existing  sequence  of  transitions  at  the  node 
and  s„  denote  the  final  value  of  this  sequence.  We  define  an  equivalent  voltage  corresponding  to  the 
ternary  signal  s„  as  v„=0.0,  ar*VDDl  or  VDD  depending  on  whether  s*=0,  u,  or  1,  respectively,  where 
0<a<  1  is  an  empirical  parameter.  The  default  value  for  a  is  0.5.  The  charge  on  a  node  nr  is  defined 
to  be  the  product  v,*CAP(n,),  where  CAP(n„)  is  a  lumped  capacitance  from  node  nr  to  ground.  In  the 
case  of  charge  sharing  among  the  nodes  of  a  component  of  normal  strength,  the  total  charge  in  the  com¬ 
ponent  is  computed  by  summing  up  the  charges  on  each  node  in  the  component  and  this  quantity  is 
divided  by  the  total  capacitance  to  yield  a  final  voltage 


£  v„*CAP(n,,) 


The  final  ternary  value  Sf  reached  by  all  the  nodes  in  the  component  after  charge  sharing  is  then  com¬ 
puted  from  vf  as  %=(),  u,  or  1  depending  on  whether  vf  ^  Vu  VL<vf<VH,  or  VH^vf,  respectively, 
where  VL  and  VH  are  the  low  and  high  thresholds  as  defined  in  Chapter  3.  For  each  node  n„  if  s„=Sf 
then  no  further  analysis  is  required.  Otherwise,  if  either  s„  or  %  is  a  u,  then  the  transition 
is  appended  to  the  sequence  S(n,).  If  s,  €  {0,1}  and  %=->sr>  then  the  pair  of  transitions 
(s^ujtj) ,  is  appended  to  the  node  sequence  S(n„).  The  transition  times  are  then  suitably 

delayed  and  the  sequence  is  filtered  appropriately. 

If  |STGr|=l  and  the  strength  of  the  component  is  either  input  or  pullup  then  the  component  is 
simulated  as  follows.  Let  n*  be  the  unique  strongest  node  in  the  component.  Let  S5  be  the  sequence  of 
transitions  at  the  strongest  node  occurring  within  the  phase,  Le^  taking  place  between  kj  and  kj+1. 
Consider  a  normal  node  n,  in  this  component.  If  the  node  was  experiencing  a  conflict  in  the  previous 
phase  then  the  conflict  is  declared  as  resolved  in  the  present  phase.  Suppose  a  conflict  that  existed 
between  times  k;  and  kj  for  some  i< j  at  n,  has  now  been  resolved  in  the  present  phase.  If  the  dura¬ 
tion  of  the  conflict  kj— k;  is  more  than  a  preselected  parameter  known  as  a  con  flict  parameter,  then 
the  conflict  at  n,  is  declared  as  a  me jor  con JUct,  otherwise,  it  is  a  minor  conflict.  In  case  of  a  major 
conflict,  a  transition  from  the  state  of  the  node  n,  just  before  kj  to  the  u  state  is  created  at  time  kj  fol¬ 
lowed  by  a  transition  from  u  to  the  initial  value  of  Ss  at  time  kj.  Thus,  in  a  major  conflict,  a  node  is 
forced  to  occupy  the  u  state  for  the  entire  duration  of  the  conflict.  Minor  conflicts  are  totally  ignored. 
Once  all  conflicts  (if  any)  are  resolved,  we  again  consider  each  normal  node  nr  in  the  component.  If  the 
initial  value  of  S,  is  different  from  the  final  value  of  the  existing  sequence  S(nr),  then  the  appropriate 
transitions  to  the  iritial  value  of  Ss  are  appended  to  the  node  sequence  S(n,)  followed  by  appending  the 
sequence  S,  itself.  Each  of  the  transitions  appended  is  then  suitably  delayed  and  filtered. 


Thus  far,  we  have  only  considered  transistors  which  are  in  the  closed  state  during  a  phase  <f>j.  A 
pass  transistor  is  said  to  have  a  state  u'  if  its  gate  node  is  at  the  u  state  in  the  present  phase  but  occupies 
a  1  in  the  next  phase.  A  transistor  in  the  u’  state  in  the  present  phase  is  in  an  intermediate  conduct¬ 
ing  state  but  would  occupy  a  closed  state  during  the  next  phase.  This  interpretation  is  radically  quite 
different  from  the  interpretation  of  the  presence  of  the  X  state  at  the  gate  node  of  a  transistor  in  con¬ 
ventional  switch-level  simulators  such  as  MOSSIM  [19].  The  second  part  of  the  simulation  of  the  PTB 
within  a  phase  begins  by  constructing  a  supergraph  with  a  vertex  for  each  component  Cr  of  Ht  and  an 
edge  between  two  vertices  Cr  and  Cs  if  a  transistor  in  the  u’  state  has  its  drain  node  in  Cr  and  source 
node  in  Cs  or  vice  versa.  The  transistors  whose  gate  signals  are  in  the  0  state  or  in  a  u  but  not  in  a  u* 
state  are  ignored  during  the  present  phase.  The  connected  components  of  the  supergraph  partition  the 
components  of  Ht  into  supercomponents,  such  that  each  supercomponent  consists  of  a  set  of  components 
linked  by  pass  transistors  in  the  u'  state. 

If  a  supercomponent  consists  of  only  one  component,  then  no  further  analysis  is  required  for  this 
phase.  Otherwise,  the  strength  of  the  supercomponent  is  computed  as  the  strength  of  the  strongest  com¬ 
ponent.  If  the  strength  of  a  supercomponent  is  input  or  pullup  and  it  contains  more  than  one  strongest 
component,  then  this  would  lead  to  a  conflict  in  the  next  phase  and  the  simulation  is  postponed  until 
the  next  phase.  If  the  strength  of  a  supercomponent  is  normal  then  this  would  clearly  lead  to  charge- 
sharing  in  the  next  phase  and,  once  again,  the  simulation  is  postponed  until  the  next  phase.  The  only 
situation  left  to  consider  is  when  the  strength  of  a  supercomponent  is  input  or  pullup  and  it  has  only 
one  strongest  component.  Suppose  the  strongest  component  has  only  one  strongest  node  whose  Anal 
value  in  the  present  phase  is  sf.  Then  for  each  node  in  each  normal  component,  the  transitions  from  the 
final  value  of  its  node  sequence  to  %  are  appended  to  the  node  sequence.  The  transitions  are  delayed 
only  if  the  node  is  not  in  a  conflict  during  the  present  phase.  If  the  strongest  component  has  more  than 
one  strongest  component  then,  once  again,  the  simulation  is  postponed  until  the  next  phase. 


The  algorithm,  described  above,  for  the  simulation  of  a  PTB  is  somewhat  heuristic,  and  instead  of 
presenting  a  formal  description,  we  will  illustrate  several  of  its  features  through  an  example.  Consider 
a  PTB  shown  in  Figure  4.7,  consisting  of  six  pass  transistors.  We  would  like  to  simulate  the  PTB 
between  0.0  and  80.0  as  with  a  minimum  resolvable  time  hmui=  0.01  ns.  Thus  transition  times  will  be 
represented  by  integer  multiples  of  0.01  ns.  Let  us  suppose  that  we  will  ignore  any  conflict  lasting  less 
than  0.1  ns,  Le,  we  choose  the  conflict  parameter  £?=  10.  For  purposes  of  illustration  we  use  an  arrow 
head  at  a  node  to  indicate  innut  strength  and  a  triangle  to  indicate  pullup  strength.  Thus  nodes  no,  115, 
and  n<,  are  of  input  strength  while  nodes  nj,  n2,  n3,  and  n4  are  of  pullup  strength.  The  nodes  n7  and 
ng  are  normal  nodes  in  the  circuit.  The  set  of  gate  nodes  for  the  pass  transistors  is  NG={n3,n2,n3}.  ^et 
us  assume  the  sequences  of  transitions  at  these  nodes,  which  have  already  been  computed,  to  be 

S,=( 0,u,4025) ,  (u.0,4060) ,  (0,u,6025) ,  (u,l,6070) 

S2=(  l,u,2015) ,  (u,0,2025) 

S3=( 0,u,2025) ,  (u, 1,2060) ,  (l,u,6075) ,  (u,0,6l00) 

respectively.  The  signal  at  the  ground  node  n^,  is  at  0  for  all  time  and  that  at  the  supply  node  n*  is  at  a 
1  always.  Node  115  is  driven  by  a  pulsed  voltage  source  with  a  sequence  of  transitions 

Ss  =  (0,u,100l) ,  (u, 1,1005) ,  (l,u,2014) ,  (u,0,2018)  ,(0,iU002) ,  (u,l,3006) ,  (l,u,4013) ,  (u,0,4017) , 
(0,u^002) ,  (u,l,5006) ,  (1.U.6014) ,  (u,0,60l8) ,  (0,u,7002) ,  (u,l,7006) 

and  the  node  n4,  which  is  the  output  of  an  inverter  with  n3  as  input,  has  a  zero-delay  sequence 

S4='^S5.  The  transition  times  of  the  gate  sequences  S1(  S2,  and  S3,  arranged  in  order  gives  us  the  set 

0 = { 20 15,2025,2060,4025,4060,6025,6070,6075,6 100} 

which  has  nine  elements  and  hence  results  in  ten  phases.  The  first  phase  is  0j  =(0,2015),  the  second  is 
02  =(20 15,2025),  and  so  on,  until  the  last  phase  which  is  01O.  We  recall  that  Sj  j  is  the  fixed  ternary 
state  occupied  by  the  gate  node  n,  during  phase  0j.  We  will  represent  these  in  a  3X10  matrix 
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OOOOuOu*  111 

A  =  luO  OOOO  000.  (4.7) 

00  u*  1111  luO 

For  example,  from  the  third  column  of  the  above  matrix  we  see  that  nodes  n,  and  n2  are  in  the  0  state 
during  the  third  phase  and  node  n3  is  in  the  u‘  state.  The  second  row  of  the  matrix  says  that  node  n2  is 
in  the  1  state  during  in  the  u  state  during  <f>2  and  0  f rom  then  on  until  the  end.  The  simulation  in 
each  phase  will  consist  of  two  parts.  In  the  first  part  we  will  contruct  a  graph  on  six  vertices,  namely, 
no,  n4, 115,  n<„  n7,  and  n8,  with  edges  corresponding  to  transistors  whose  gate  signals  are  in  the  1  state. 
The  second  part  will  deal  with  a  supergraph  whose  vertices  are  components  of  the  first  graph  and  edges 
corresponding  to  transistors  with  gate  signals  in  the  u*  state. 

Phase  1,  (0,2015) 

From  the  first  column  of  the  matrix  A  in  Equation  (4.7)  we  see  that  in  this  phase  only  node  n2  is 
in  the  1  state.  The  graph  is  shown  in  Figure  4.8(a)  and  has  four  components.  Components  C,  and  C4 
have  only  one  node  each  and  therefore  no  analysis  is  necessary.  The  strength  of  C2  is  input  and  it  con¬ 
tains  only  one  input  node  n®.  The  node  no  is  always  in  the  0  state,  Le,  its  corresponding  sequence  is 
(u,0,— 1).  The  the  normal  node  n7  in  this  component  will  have  this  transition  appended  to  its  existing 
sequence,  which  is  the  null  sequence  initially.  The  strength  of  the  component  C2  is  also  input  and  it 
also  contains  only  one  strongest  node,  namely,  n$.  The  sequence  of  transitions  in  S5  occurring  within 
*1  is  (0,u,1001) ,  (u,l,1005).  This  will  be  the  zero-delay  sequence  to  be  appended  to  the  sequence  at 
the  normal  node  ng  in  this  component.  Thus  on  delaying  and  filtering  the  sequences  at  the  normal 
nodes  we  get 

S,  =  (u.0,-1) 

S8  =  (0,u,1022) ,  (u,  1,1188). 

Since  there  are  no  transistors  in  the  u‘  state  in  this  phase,  the  second  part  of  the  phase  simulation  can  be 
bypassed. 


Phase  2,  (2015,2025) 


There  are  no  transistors  in  this  phase  with  gate  signals  either  in  the  1  or  in  the  u’  state,  and  hence 
the  entire  phase  simulation  can  be  bypassed.  There  is  no  change  in  the  sequences  S7  and  Sg  given  above. 

Phase  3,  (2025,2060) 

There  are  no  transistors  with  gate  signals  in  the  1  state  in  this  phase.  The  graph  therefore  will 
have  no  edges  and  hence  the  first  part  of  the  simulation  in  this  phase  can  be  bypassed.  The  signal  at 
node  n3,  however,  is  in  the  u*  state,  thus  resulting  in  a  supergraph  with  six  vertices  and  two  edges  as 
shown  in  Figure  4.8(b).  Two  of  the  supercomponents  SCj  and  SC3  contain  only  one  component  each 
and  therefore  need  not  be  analyzed  any  further.  The  supercomponent  SC2  has  C6  as  its  only  strongest 
component,  and  the  final  value  of  node  n5,  which  is  the  only  node  in  Cfc,  in  this  phase  is  0.  Since  this 
agrees  with  the  final  value  of  the  existing  sequence  at  node  n7,  which  also  happens  to  be  the  only  nor¬ 
mal  node  in  C2,  we  conclude  that  there  is  no  change  in  sequence  S7  in  this  phase.  The  supercomponent 
SC4  is  of  strength  pullup  consisting  of  a  pullup  component  C4  and  a  normal  component  C5.  The  com¬ 
ponent  C4  consists  of  only  one  pullup  node  n4,  the  final  value  of  whose  sequence  in  this  phase  is  a  1. 
Once  again  this  agrees  with  the  final  value  of  the  existing  sequence  at  node  n8,  which  is  the  only  nor¬ 
mal  node  in  Cs,  and  hence  there  in  no  change  to  either  S7  or  Sg  in  this  phase. 

Phases  4, 5,  and  6,  (2060,6025) 

From  time  2060  up  to  6025,  node  n3  is  fixed  at  the  1  state  and  node  n2  is  fixed  at  0.  Node  n)( 
however,  is  at  0  during  #4=( 2060,4025),  occupies  the  u  state  temporarily  during  05 =(4025,4060),  and 
comes  back  to  the  0  state  during  #b=(4060,6025).  The  graph  during  these  three  phases  is  shown  in 
Figure  4.8(c).  It  consists  of  four  components.  Two  of  these  components,  Q  and  C2,  contain  only  one 
vertex  each  and  need  not  be  analyzed  any  further.  In  C3,  node  n7  is  connected  to  ns.  Let  S  denote  the 
sequence  of  transitions  in  S5  occurring  between  2060  and  6025,  i.e, 

S  =  (0,u^3002) .  ( u,l,3006) ,  ( l,u,4013) ,  (u,0,4017)  ,  (0,u^002) ,  (u,l,5006) ,  ( l,u,6014) ,  (u,0,6018). 


bince  the  initial  value  of  S  agrees  with  the  final  value  of  the  existing  S?,  we  simply  append  S  to  S7.  In 

C4,  node  ng  is  connected  to  the  node  n4.  The  sequence  of  transitions  in  S4  occurring  during  these  phases 

is  clearly  -*S.  Once  again,  since  the  initial  value  of  -«S,  which  is  1,  agrees  with  the  final  value  of  the 

existing  Sg,  we  simply  append  ->S  to  S8.  On  delaying  the  transitions  that  were  just  appended  we  get 

S-  =  (0,11*3023) ,  ( u,l*3189) ,  ( l,u,4032) ,  (u,0,4076) ,  (0,u^023)  ,  (u.1,5189) ,  ( l,u,6032) ,  (u,0,6076) 

Sg  =  (0,u,1022) ,  (u,l,l  188)  ,  ( l,u*3065) ,  (u.0,3237) ,  (0,u,4174) ,  (u,l,4657) ,  ( l,u^065) ,  (u,0,5237) 
(0,u,6175) ,  (u,l,6658). 

It  must  be  noted  that  the  last  pair  of  transitions  in  Sg  takes  place  well  after  and  could  be  deleted  by 
the  filtering  operator  during  simulation  in  4> i-  Furthermore,  the  second  part  of  the  simulation  can  be 
bypassed. 

Phase  7,  (6025,6070) 

The  graph  during  this  phase  is  the  same  as  the  one  in  Figure  4.8(c)  and  there  is  no  change  in  either 
S,  or  Sg  after  the  first  part  of  simulation  in  this  phase.  The  supergraph  constructed  in  the  second  part 
of  the  simulation  is  shown  in  Figure  4.8(d).  The  supercomponent  SC2  consists  of  only  one  component 
Ci  and  hence  need  not  be  analyzed  any  further.  The  supercomponent  SC,,  however,  consists  of  three 
components,  namely,  C,.  C3,  and  C4.  C,  and  C3  are  of  input  strength  while  C4  is  of  pullup  strength. 
Since  the  transistors  linking  the  components  in  this  phase  would  be  closed  in  the  next  phase,  a  possibil¬ 
ity  of  the  three  components  merging  into  one  during  the  next  phase  exists.  The  new  component  would 
then  have  two  strongest  nodes,  thereby  leading  to  a  conflict.  Hence,  we  do  not  make  any  changes  in 
either  S7  or  Sg  even  after  the  second  part  in  this  phase. 

Phase  8,  (6070,6075) 

In  this  phase  both  n,  and  n3  are  in  the  1  state,  thus  resulting  in  the  graph  shown  in  Figure  4.8(e). 
The  component  C,  has  two  strongest  nodes,  namely,  n*,  and  n5.  Therefore  a  conflict  is  declared  at  the 
normal  nodes  n7  and  n8.  The  duration  of  the  conflict  at  both  these  nodes  is  6075—6070=5,  or  0.05  ns 
in  real  time.  Note  that  this  situation  was  anticipated  in  the  second  part  of  the  simulation  of  <(>-,.  The 


component  C2  has  a  single  node  and  hence  need  not  be  analyzed  any  further.  Since  there  are  no  gate 
nodes  in  the  u’  state  during  this  phase,  the  second  part  can  be  bypassed. 

Phase  9,  (6075,6100) 

The  graph  constructed  in  the  first  part  of  the  simulation  in  this  phase  is  shown  in  Figure  4.8(f).  It 
consists  of  four  components,  two  of  which,  namely,  C,  and  C2,  have  only  one  node  each.  The  com¬ 
ponent  C3  consists  of  a  normal  node  n7  connected  to  an  input  node  14.  The  component  C4  has  normal 
node  n8  connected  to  the  input  node  115.  Since  both  n7  and  n8  were  involved  in  a  conflict  situation  in 
the  previous  phase,  this  conflict  is  now  resolved.  The  total  duration  of  the  conflict  in  either  node  was  5 
in  integer  time,  which  is  less  than  €,.=10,  and  hence  the  conflict  is  declared  as  a  minor  conflict  and  is 
ignored.  The  final  value  of  S?  is  a  0  while  the  state  of  the  node  14  is  a  1  since  it  is  the  supply  node. 
Hence  we  append  the  pair  (0,u,607 5) ,  ( 0,1,6076)  to  S7,  which  on  delaying  would  result  in 

S.  =  (0,143023) ,  (14 1,3 189) ,  (l,u,4032) ,  (u.0,4076)  ,  (0,145023)  ,  (u,  1,5 189)  ,  ( l,u,6032) ,  (u,0,6075)  , 
(0,u,6 106)  ,  (14 1,6273). 

The  initial  state  of  the  node  115  in  this  phase  is  a  0.  The  final  value  of  Sg  can  be  seen  to  be  a  1. 
However,  the  last  pair  of  transitions  (0,u,6175) ,  (u,l,6658)  takes  place  well  after  the  present  phase. 
Hence  this  pair  is  deleted  from  Sg  and  now  the  final  value  of  Sg  is  a  0  which  agrees  with  the  initial 
state  of  the  strongest  node,  n 5,  in  its  component.  This  is  an  example  of  the  filtering  operation  to  be  dis¬ 
cussed  in  Chapter  5.  Since  S5  has  no  transitions  occurring  in  this  phase,  we  are  done  with  the  first  part 
of  the  simulation  in  this  phase.  The  second  part  is  bypassed.  Thus  the  sequence  at  node  n8  after  this 
phase  turns  out  to  be 

Sg  =  (0,u,1022) ,  (141,1188) ,  (1,143065) ,  (u,0,3237)  ,  (0,u,4174) ,  (u,l,4657) ,  (1,145065) ,  (u,0,5237). 
Note  that  we  have  deleted  the  last  pair  of  transitions  from  the  previous  sequence  S8. 

Phase  10,  (6100,8000) 

In  this  phase  the  graph  remains  the  same  as  in  the  previous  phase.  The  sequence  does  not 
change  since  n7  is  still  connected  to  the  supply  node  14.  The  pair  of  transitions  (0,u,7002) ,  (u,l,7006) 


from  S5  occurring  within  this  phase  get  delayed  and  appended  to  S8. 

Thus  the  final  result  is  that  the  sequences  at  nodes  n7  and  ng  are 

S-  =  (0,143023) ,  (u,U189) ,  ( l,u,4032) ,  (u,0,4076) ,  (0,145023)  ,  (u.1,5189) ,  ( l.u.6032) ,  (u,0,6076)  , 
(0,u,6 106) ,  ( 141,6273) 

and 

S„  =  (0,u,1022) ,  (14 1,1188) ,  (1,143065) ,  (u,0,3237) ,  (0,u,4174) ,  (u,l,4657) ,  (1,145065)  ,  (u,0,5237) , 
(0,u,7023) ,  (u,l,7189). 

43  Conclusions 

We  began  this  chapter  by  defining  transitions  between  ternary  states  and  showed  how  sequences 
of  transitions  can  be  used  to  represent  ternary  digital  waveforms  of  signals.  We  also  presented  algo¬ 
rithms  that  perform  a  switch-level  simulation  of  SRCs,  MFB’s,  and  PTB’s.  In  the  case  of  an  SRC  the 
sequence  of  transitions  at  the  output  node  is  constructed  directly  from  the  input  description  of  its  ana¬ 
log  waveform.  In  the  case  of  an  MFB  we  showed  that  the  zero-delay  state  of  its  output  node  at  any 
instant  of  time  is  a  B-ternarv  function  of  the  states  of  its  input  nodes  at  the  same  time  instant  Furth¬ 
ermore,  the  output  node  of  an  MFB  is  of  puilup  strength  and  the  only  stronger  node  in  the  D-block  of 
the  MFB  is  the  ground  node.  On  exploiting  all  these  properties  of  an  MFB,  we  came  up  with  a  fairly 
simple  graph  algorithm  based  on  simplification  of  graphs  and  eliminating  internal  vertices  in  simple 
graphs  to  compute  the  sequence  of  transitions  at  the  output  node  of  an  MFB  directly  from  those  at  the 
input  nodes  of  the  MFB-  For  a  PTB,  we  presented  a  more  complex,  and  somewhat  heuristic,  approach 
utilizing  the  full  power  of  conventional  switch-level  simulation.  This  approach  is  similar  to  that  of 
MOSSIM  [19],  except  for  the  interpretation  of  the  intermediate  u  state  (or  the  X  state  as  used  in  MOS- 
SIM).  Wre  illustrated  the  approach  with  the  help  of  a  simple  example. 

If  a  block  of  a  partitioned  network  appears  in  a  simple  SCC,  and  if  the  SCCs  have  been  processed 
according  to  the  ordering  presented  at  the  end  of  Chapter  3,  then  the  sequence  of  transitions  at  each 
input  node  to  the  block  will  be  known  for  the  entire  time  interval  of  interest.  In  this  case  the  block 


128 


can  be  simulated  for  the  entire  period  of  time  by  algorithms  described  in  this  chapter.  Otherwise,  the 
blocks  are  simulated  only  over  certain  windows  in  time.  The  end  points  of  these  windows  are  specified 
by  a  special  algorithm  to  be  described  in  Chapter  6. 


CHAPTER  5 


DELAY  AND  FILTERING  OPERATIONS 


The  algorithms  described  in  the  previous  chapter  compute  zero-delay  sequences  of  transitions  at 
the  output  nodes  of  an  MFB  and  normal  nodes  of  a  PTB.  By  zero  delay,  we  mean  a  transition  at  the 
gate  node  of  a  transistor  causes  a  transition  in  the  switching  state  of  the  transistor  immediately,  and 
this  change  affects  the  state  of  other  nodes  without  any  delay  in  time.  In  this  chapter  we  will  consider 
altering  the  transition  times  so  that  the  resulting  sequence  would  then  correspond  to  a  ternary 
waveform  that  is  fairly  close  to  the  ternary  equivalent  of  the  analog  waveform  if  computed  by  an 
accurate  circuit  simulator.  The  task  of  the  delay  operator  is  to  alter  the  transition  times  only  for  a 
complete  pair  of  transitions.  Each  application  of  the  delay  operator  is  followed  by  a  filtering  operation 
which  accounts  for  the  effect  of  delaying  a  complete  pair  of  transitions  on  the  future  transitions  in  the 
sequence.  The  filtering  operator  also  transforms  a  partial  pair  of  transitions  into  a  form  that  can  be 
handled  by  the  delay  operator. 

The  delay  operator  is  characterized  by  delay  functions  which  are  computed  for  a  set  of  standard 
circuit  primitives  and  stored  in  tables.  This  step  involves  the  use  of  an  accurate  circuit  simulator  to 
simulate  each  primitive  and  could  consume  large  amounts  of  computation  time.  The  circuit  primitives, 
however,  do  not  change  as  long  as  the  technology  remains  fixed  and  hence  the  computations  of  the 
delay  functions  need  be  performed  only  once  for  each  technology.  This  step,  therefore,  can  be  con¬ 
sidered  as  a  preprocessing  phase  since  the  same  delay  tables  could  be  used  to  simulate  many  different 
networks  designed  in  a  fixed  technology.  The  delay  operator  then  computes  new  values  for  transition 
times  in  a  complete  pair  of  transitions  at  a  certain  node  in  a  general  block  in  two  steps.  First,  a  map¬ 
ping  technique  is  used  to  transform  the  block  into  a  configuration  that  resembles  one  of  the  primitives 


for  which  the  delay  functions  have  been  computed  Time  scaling  is  then  used  to  transform  the  new 
configuration  into  a  standard  primitive  after  which  the  delay  values  can  be  obtained  through  a  table 
lookup. 

5.1  Computation  of  Delay  Functions  for  Standard  Primitives 

In  the  case  of  conventional  NMOS  depletion  load  technology,  we  consider  five  basic  configurations, 
called  primitives. 

Primitive  1  :  A  simple  inverter  driving  a  lumped  grounded  capacitance  Cj.  An  input  signal  is 
applied  at  the  gate  node  of  the  driver  transistor  mD  and  the  output,  is  observed  at  the  murce  node 
of  the  load  transistor  mL  as  shown  in  Figure  5.1.  We  consider  two  types  of  input  waveforms,  namely. 
Type  "0”  :  rising  from  0  V  to  5  V 

and 

Type  "1"  :  Vm  falling  from  5  V  to  0  V. 

Primitive  2  :  A  pass  transistor  mP  whose  drain  is  connected  to  a  constant  DC  voltage  source  V^-  and 
the  gate  driven  by  a  pulse  Vin  rising  from  0  V  to  5  V.  The  source  node  of  mr  is  connected  to  a 
grounded  capacitance  C2  as  shown  in  Figure  5.2.  The  output  waveform  VQ  in  this  case  is  observed  at 
the  source  node  of  mp.  We  consider  two  types  of  Vdo  namely, 

Type  "0" :  Vre  -  0  V 

and 

Type  "l" :  Vjjc  -  5  V. 

Primitive  3  :  A  pass  transistor  mp  whose  gate  is  held  fixed  at  5  V  and  drain  driven  by  an  input  pulse 
Vin.  The  source  node,  which  is  also  the  output  node,  has  a  waveform  V0  and  is  connected  to  a  grounded 
capacitance  C2  as  shown  in  Figure  5.3.  We  consider  two  types  of  input  waveforms,  namely. 

Type  "0" :  rising  from  0  V  to  5  V 


Type  *1" :  falling  from  5  V  to  0  V. 


Primitive  4  :  A  simple  inverter  with  driver  transistor  mD  and  load  mL  driving  a  pass  transistor  mp. 
Grounded  capacitors  Q  and  C2  are  connected  to  the  pullp  node  of  the  inverter  and  to  the  source  node  of 
the  pass  transistor,  respectively.  A  pulse  rising  from  0  V  to  5  V  is  applied  at  the  gate  of  the  pass 
transistor  mp  while  the  gate  of  the  driver  transistor  mD  is  connected  to  a  fixed  DC  voltage  source  V^ 
as  shown  in  Figure  5.4.  There  are  two  types  of  Vue,  namely, 

Type'O":  Vdc-OV 

and 

Type  "l"  :  Vjjc  -  5  V. 

Primitive  5  :  Same  configuration  as  primitive  4  except  that  the  gate  of  the  pass  transistor  mp  is  held 
fixed  at  5  V  while  a  pulse  is  applied  at  the  gate  of  the  driver  transistor  mD  as  shown  in  Figure  5J. 
Here,  we  consider  two  types  of  input  pulses,  namely. 

Type  "0"  :  Vu  rising  from  0  V  to  5  V 

and 

Type  "l" :  V^,  falling  from  5  V  to  0  V. 

In  each  of  the  above  primitives  we  have  an  input  waveform  Vin  which  varies  between  VDD=5  V 
and  0  V  and  produces  an  output  waveform  V„.  For  a  fixed  input  waveform,  the  shape  of  the  output 
V0  could  depend  upon  several  circuit,  device,  and  process  parameters.  The  parameters  we  would  con¬ 
sider  are  the  following:  zero-bias  device  threshold  (VTO),  both  for  enhancement  and  depletion  devices, 
a  resistance  for  each  device  which  is  a  function  of  the  ratio  of  its  channel  length  (L)  to  its  width  (W), 
the  transcorductance  parameter,  KP=/*n€ox/tox,  which  in  turn  is  a  function  of  the  carrier  mobility  na, 
the  permittivity  of  the  oxide  material  €ox  and  the  thickness  of  the  oxide  tox,  and  finally,  the  capaci¬ 
tance  at  each  node.  Among  these  parameters  we  assume  that  all  enhancement  transistors  have  the  same 
zero-bias  threshold,  VTOp,  all  depletion  transistors  have  the  same  VTOd,  and  that  these  values  remain 


fixed  for  a  given  technology.  Typical  values  are  VTOE=+1.0V  and  VTOD=— 3.0  V.  The  rest  of  the 
parameters  are  allowed  to  vary  between  the  different  devices  and  nodes  in  the  network.  In  the  five 
primitives  described  above,  we  let  RD=RES(mD),  R,  =RES(mL),  and  Rr=RES(mP)  denote  the  device 
resistances  of  the  driver,  load,  and  pass  transistors,  respectively.  We  will  choose  a  standard  driver,  a 
standard  load,  and  a  standard  pass  transistor,  and  let  R^,  Rls,  and  RPS  denote  the  resistances  of  these 
standard  devices,  respectively.  A  typical  set  of  standard  devices  is 
Load  :  W=5  ft ,  L=10  ft. 

Driver :  W=10  p. ,  L=5  /*, 

Pass  :  W=10  p  ,  L=10  p. 

For  the  above  choice  of  standard  load  and  driver  devices,  we  notice  that  RLS/RDS=4.  We  will  refer  to 
this  ratio  as  the  standard  inverter  ratio  and  denote  it  by  8S. 

Let  Os  denote  the  standard  capacitance  in  the  case  of  the  itb  primitive.  Typically,  CiS=0.01  pF 
for  i=  1,2,3  and  C^O-l  pF  for  i=4,5.  A  primitive  is  a  standard  primitive  if  RD=RDS,  Rl=Rls. 
Ci=Cls  in  primitive  1,  RP=RPS,  C2=C2s(C3s)  in  primitives  2  and  3,  and  RP=RPS,  RL/RD=8, 
C2=C4s(C5S)  in  primitives  4  and  5.  In  primitives  4  and  5  let  us  define  two  dimensionless  quantities 
/3=RD/RP  and  y=C,/C2.  We  use  these  two  additional  parameters  to  completely  specify  the  standard 
primitive.  We  allow  0  and  y  to  be  variable  over  ranges  [0min,0miX]  and  tymuifTmaxl  respectively. 

Consider  one  of  the  above  primitives.  We  treat  to  be  an  analog  ramp  waveform  with  a  full 
swing  of  VDD.  This  waveform  will  then  cross  both  the  threshold  voltages  VL  and  VH.  Let  ti  and  t2 
denote  the  two  threshold  crossing  times.  Clearly,  this  change  in  the  input  waveform  would  cause  the 
output  waveform  V0  to  cross  both  the  thresholds  also.  Let  t',  and  t'2  be  the  output  threshold  crossing 
times.  We  define  Ain=t2— t,  as  a  measure  of  the  slew  rate  of  the  input  signal  and  two  delay  quantities. 
At, sst'i- 1|,  known  as  the  inertial  delay,  and  At2=t'2— t',,  known  as  the  rise/ fall  delay.  Thus,  given  t, 
and  the  two  delay  quantities,  we  can  easily  compute  t',=t,+At,  and  t'2=t',+At2.  We  will  use  the 
symbol  At0  to  refer  to  both  the  delays  collectively. 


We  will  now  consider  computing  At„  for  standard  primitives.  W'e  first  consider  the  standard 
primitive  1  with  rising  inputs.  i.e-,  type  "0".  In  this  case  the  device  sizes  and  node  capacitances  are  fixed 
at  their  standard  values.  W'e  consider  an  input  ramp  Vin  with  a  certain  rise  time,  resulting  in  some 
value  of  A,,,.  W'e  then  simulate  this  circuit  using  an  accurate  circuit  simulator,  such  as  SP1CE2  [ll 
which  gives  us  a  falling  waveform  for  From  both  the  input  and  the  output  waveforms  we  can 

compute  the  thresh  Id  crossing  times  tj,  flt  and  t‘2,  and  hence  both  the  delays  At,  and  At2.  We 
then  repeat  this  for  a  falling  input  ramp,  i.e,  type  "l",  with  the  same  slew  rate  as  before  and  compute 
two  more  delay  values.  This  experiment  is  then  repeated  with  input  ramps  of  different  slew  rates,  each 
time  producing  four  more  delay  values  (two  in  each  type),  which  are  stored  in  a  table  as  functions  of 
Ain.  The  entire  procedure  is  repeated  to  generate  the  delay  tables  in  the  case  of  standard  primitives  2 
and  3.  The  tables  in  all  three  cases  are  one-dimensional  since  their  entries  are  functions  of  only  Ain. 
Each  table  entry  contains  four  values,  namely.  At,  and  At2  for  type  "0"  and  the  same  for  type  "l". 

In  the  case  of  standard  primitives  4  and  5,  we  need  to  specify  the  values  of  C,,  RD  and  Rp  in 
order  to  completely  specify  the  circuit.  We  do  this  with  the  help  of  the  parameters  0,  y,  and  8.  For 
fixed  values  of  these  parameters,  we  get  C,=yC2,  RD=0Rp,  and  R1=8R,>  where  Rp  and  C2  take  on  the 
standard  values.  For  the  present  we  consider  the  inverter  ratio  8  to  be  a  fixed  parameter.  We  will 
remove  this  restriction  in  the  later  sections.  We  start  with  some  initial  values  for  0  and  y,  simulate  the 
circuit  using  SPICE2  [ll  and  obtain  a  set  of  delay  values  for  each  value  of  Ain.  We  repeat  this  pro¬ 
cedure  for  different  values  of  0  and  y  and  generate  three-dimensional  delay  tables.  Each  entry  in  the 
table  contains  four  delay  values  as  before;  however,  these  values  are  now  functions  of  three  parameters, 
namely,  the  slew  rate  of  the  input  Ain,  a  ratio  of  driver  to  pass  transistor  resistance  0,  and  a  ratio  of 
capacitances  y. 

We  have  therefore  described  the  generation  of  delay  tables  for  a  fixed  technology.  In  case  of  a 
change  in  technology,  the  procedure  has  to  be  repeated  to  generate  a  new  set  of  tables.  It  must  be  noted 
that  we  consider  a  change  in  the  values  of  the  zero-bias  device  thresholds  VTOD  and  VTO^  as  a  change 


in  the  process  technology.  However,  if  there  is  only  a  change  in  the  transconductance  parameter  (KP) 
or  any  of  the  parameters  that  affect  its  value,  we  can  use  the  same  set  of  delay  tables  as  will  be  shown 
in  Section  5.2.  The  delay  values  are  plotted  as  functions  of  input  slew  rate  A^,  for  primitives  1,  2,  and 
3  and  as  functions  of  Ai,,,  0  and  y  for  primitives  4  and  5  in  Appendix  1  for  a  particular  technology. 


5.2  Delay  Functions  for  Nonstandard  Primitives 

In  this  section  we  will  show  how  we  can  compute  the  delay  values  for  nonstandard  primitives 
from  the  delay  tables  for  standard  primitives  computed  in  the  previous  section.  By  nonstandard  primi¬ 
tives,  we  mean,  primitives  that  have  nonstandard  devices  and  nonstandard  node  capacitances. 

For  the  analysis  below  we  choose  a  simple  DC  analog  model  for  an  NMOS  transistor  by  ignoring 
body  effect,  channel  length  modulation,  short  channel  effects,  and  other  higher-order  effects.  Then  for 
any  primitive  i,  where  i= 1,2^5,  we  can  write  the  first-order  differential  equation  for  the  output 
waveform  in  the  following  simplified  form  : 

=  _L  fi(V0(t),Vin(t))  (5.1) 

at  (T , 
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T)  is  a  fixed  constant  for  a  given  technology,  and  flP  f2  and  f3  are  some  nonlinear  functions  of  their 
arguments.  It  must  be  noted  that  in  case  of  a  nonstandard  primitive  1,  the  Equation  (5.1)  is  obtained 
by  assuming  that  the  inverter  ratio  8=85,  where  85  denotes  the  standard  inverter  ratio.  We  justify 
this  assumption  with  the  following  arguments.  In  the  case  of  falling  output  waveforms,  i.e,  a  type  "0" 


situation,  the  current  ID  through  the  driver  transistor  is  primarily  responsible  for  discharging  the  out- 


put  capacitance  Q  and  hence  there  is  no  significant  change  if,  in  this  case,  the  load  transistor  is  replaced 
by  a  depletion  device  with  Rl=SsRj>  Similarly,  for  rising  output  waveforms,  Le,  a  type  "l"  situation, 
the  current  IL  through  the  load  transistor  is  primarily  responsible  for  charging  Cj  and  hence  there  is  no 
significant  change  if,  in  this  case,  the  driver  is  replaced  by  one  with  RD=RL/8S.  It  is,  therefore,  reason¬ 
able  to  assume  that  even  in  the  case  of  a  nonstandard  primitive  1,  the  inverter  ratio  is  fixed  at  8S,  and 
so  8  need  not  be  included  as  the  third  argument  for  the  function  f 

In  the  case  of  a  nonstandard  primitive  i,  where  i=4,5,  we  can  describe  the  analog  behavior  of  the 
two  unknown  waveforms  V,(t),  the  voltage  across  the  capacitance  Ci,  and  V0(t),  the  output  voltage, 
with  the  help  of  the  following  two  first-order  differential  equations  : 


dV,(t)  1  ,  ,  .  ,  .  .  .  . 

=  — f  u(V  0(t),V  ,(t),V in(t),/5»*y) 
at  tr  j 


(5.2a) 


=  —  f  ia(  VD(  t),  V ,  ( t ),  V  in(  t  ),d  ,-y ) 
at  <r  i 


(5.2b) 


where 

RpC . 

<T|  -  <TS  -  * 

f41,  f42,  fs„  and  f52  are  some  nonlinear  functions  of  their  respective  arguments.  Once  again,  we  have 
not  included  the  parameter  8  as  one  of  the  arguments  in  the  above  functions  since  it  is  reasonable  to 
assume  that  $=8S  using  the  same  arguments  as  in  the  case  of  primitive  1. 

From  Equation  (5.1),  it  is  clear,  that  in  a  fixed  technology,  if  we  fix  the  input  waveform  Vin  and 
the  value  of  the  parameter  oit  then  we  will  get  the  same  output  waveform  V0  in  primitives  1,  2,  and 
3.  If,  in  addition,  we  also  fix  the  type,  namely  "0"  or  "l",  in  a  primitive,  then  fixing  is  equivalent  to 
fixing  the  value  of  Ain,  which  is  the  measure  of  the  input  slew  rate.  Hence,  in  the  case  of  a  nonstan¬ 
dard  primitive  i,  where  i=  1,2,3,  the  delays  (both  inertial  delay  and  rise/fall  delay)  at  the  output,  col¬ 


lectively  denoted  by  At,*  are  only  functions  of  two  parameters,  namely,  Ain  and  o..  In  the  case  of 


primitives  4  and  5,  from  Equations  (5.2a)  and  (5.2b),  it  is  clear  that  if  we  fix  \  LD,  fi,  y,  and  trt,  we  will 
get  the  same  waveforms  for  both  Vj  and  V0.  Hence,  in  the  case  of  a  nonstandard  primitive  i,  where 
i=4,5,  the  delays  At0  are  functions  of  four  parameters,  namely,  Ain,  0,  y,  and  cr ,.  In  the  previous  sec¬ 
tion  we  have  computed  the  delay  functions  for  the  case  where  tr^  denotes  the  value  of  the 

parameter  computed  for  a  standard  primitive  i=  1,  2,  3,  4,  or  5.  Using  the  same  set  of  delay  tables, 
we  will  now  demonstrate  a  technique,  known  as  time  scaling,  to  compute  the  delay  functions  for  non¬ 
standard  primitives,  Le,  primitives  with  <7j  ^  cr^. 


Suppose  we  introduce  a  new  time  variable  T  =  art,  where  a  is  a  scale  factor,  we  can  then  rewrite 
the  Equations  (5.1),  (5.2a),  and  (5.2b)  in  terms  of  r  as : 


dVa(r) 

dr 


~  fi (V0(t),Vu(t)) 


(5.3) 


and 


dV,(T) 

dr 


=  —  f^VofTWTV^r),/},?) 


(5.4a) 


dV0(r) 

dT 


=  —  f  u(  V„(  r ),  Vj  ( r ),  V  in(  r  \flty). 
<Ti 


If  we  now  set  at—<Jj<r&  in  each  of  the  above  equations,  we  get : 


dV0(r) 

dr 


-L  fi<V0(r),Vin(T)) 

**  iS 


(5.4b) 


(5.5) 


and 


dVj(r) 

dr 


-i-  fil(V0(r),V1(T),Vill(r),3fy) 
s 


(5.6a) 


dV0(r) 

dr 


fi2(V0(r),VI(T),Vill(T)>^y) 

iS 


(5.6b) 


which  are  the  same  as  Equations  (5.1),  (5.2a),  and  (5.2b),  respectively,  with  t  and  cr,  replaced  by  r  and 


orlS.  Thus,  the  Equations  (5.5),  (5.6a),  and  (5.6b)  represent  the  behavior  of  the  standard  primitives  m  a 
new  time  domain  with  r  as  the  time  variable.  The  slew  rate  of  the  input  in  this  new  time  domain  is 
A  m/a.  If  Ar„  denotes  the  delays  (both  inertial  and  nse/fall)  at  the  output  in  the  new  time  domain, 
then  clearly  At0  =  orAr^  But  A rc  can  be  obtained  from  the  delay  tables  compiled  in  the  previous  sec¬ 
tion  for  standard  primitives  for  input  slew  rate  Ain/a  in  primitives  1,  2,  and  3,  and  for  resistance  ratio 
/3  and  capacitance  ratio  y,  as  additional  parameters,  in  primitives  4  and  5.  Let  g;(A)  denote  the  delay 
functions  tabulated  as  a  function  of  input  slew  rate  A.  for  standard  primitive  i,  where  i=  1,  2,  or  3, 
and  let  gi(A,0,y)  denote  those  tabulated  as  a  function  of  input  slew  rate,  resistance  ratio,  and  capaci¬ 
tance  ratio  for  standard  primitive  i=  4  or  5.  We  can  then  outline  the  scheme  for  computing  the  delay 
values  of  nonstandard  primitives  from  those  computed  for  standard  primitives  as  follows  : 

a)  Let  i  be  the  primitive  number  and  let  Aln  be  the  input  slew  rate.  Compute  <r,. 

b)  Compute  a—<rt/arlS. 

c)  If  1,  2,  or  3,  then  obtain  At0*~a!gj(  Aln/a). 

d)  If  i=  4  or  5,  then  obtain  At„«-argi(Ain/a,3,,y). 

It  must  be  pointed  out  that  the  delay  functions  for  nonstandard  primitives  could  be  computed 
just  as  in  the  standard  case  by  introducing  an  additional  parameter  O',  in  each  of  the  tables  for  the  ith 
primitive.  This  would  then  mean  storing  two-dimensional  tables  for  primitives  1,  2,  and  3,  and  four- 
dimensional  tables  for  primitives  4  and  5.  By  using  the  scaling  technique  outlined  above,  we  have 
managed  to  obtain  the  delay  values  with  only  one-dimensional  and  three-dimensional  tables,  respec¬ 
tively.  Thus  we  have  considerably  reduced  both  the  CPU -storage  space  and  the  preprocessing  time  for 
generating  the  delay  tables  However,  we  have  used  a  very  simple  device  model  for  the  \MOS  transis¬ 
tors  to  derive  this  technique,  and  this  could  cause  some  errors  in  the  delay  predictions  if  more  complex 
device  models  are  used.  This  is  one  of  the  factors  responsible  for  timing  errors  of  the  delay  operator. 


53  Delay  Operator  for  MFB’s  and  PTB’s 

In  this  section  we  describe  a  delay  operator  which  alters  the  transition  times  in  a  complete  pair  of 
zero-delay  transitions  at  the  output  node  of  an  MFB  and  at  normal  and  pullup  nodes  of  a  PTE 

We  first  consider  an  NMOS  network  in  which  each  MFB  is  an  inverter  and  each  PTB  consists  of  a 
single  pass  transistor.  Let  n,,  be  the  output  node  of  an  inverter  and  let  (x,ujtj)  f  be  a  pair  of 

complete  transitions  of  the  zero-delay  sequence  S0  computed  by  the  switch-level  simulation  algorithms 
given  in  Chapter  4.  Also,  suppose  that  is  not  an  ioput  node  of  a  PTE  Let  C0=CAP(n0)  denote  the 
lumped  capacitance  from  the  output  node  to  ground.  Let  RD  and  RL  be  the  device  resistances  of  the 
driver  and  load  transistors  of  the  inverter,  respectively.  Let  us  first  consider  the  case  x=  1.  In  this  case 
the  pair  ,  (u*l,kj+1)  must  have  been  in  the  sequence  at  the  input  node  of  the  inverter.  We 

model  this  as  a  type  "0"  situation  in  a  primitive  1  with  Aifl=(kj+1— kj)xhmin.  We  then  compute 
art=(RDC0)/(T]KP)  and  the  scale  factor  a=<rI/<rls.  Let  ATi  and  At2  be  the  inertial  and  fall  delay 
values  obtained  from  the  delay  tables  for  the  type  "0"  case  in  a  standard  primitive  1  corresponding  to 
the  input  slew  rate  of  Ain/a.  We  then  compute  k'j=kj+oAT,/hmm  and  kj+1  =k  j+oAT2/hBim  and 
replace  the  transition  times  kj  and  kJ+,  by  the  new  times  k'j  and  k’j+1,  respectively,  in  the  sequence  Sv 
In  the  case  x=0  we  compute  the  new  transition  times  in  the  same  manner  as  above,  except  that  we 
model  it  as  a  type  "l"  situation  in  a  primitive  1  and  compute  <r,  with  RD=RL/8s. 

We  now  consider  a  PTB  consisting  of  a  single  pass  transistor.  The  only  situation  in  which  we  will 
use  the  delay  operator  is  when  one  node  among  the  drain  and  source  nodes  is  a  normal  node  and  the 
other  is  either  a  pullup  node  or  a  node  of  input  strength.  Without  loss  of  generality  we  assume  that  the 
source  node  is  the  normal  node  with  a  capacitance  C2.  Consider  a  certain  phase  in  the  simulation  of  the 
PTB  and  let  the  complete  pair  of  transitions  (x,uj^) ,  (u,->xjtj+1)  be  discovered  at  the  source  node  dur¬ 
ing  this  phase.  Let  us  first  consider  the  state  of  the  gate  node  to  be  fixed  at  1  during  this  phase.  Then 
clearly  the  same  pair  of  transitions  must  have  occurred  at  the  drain  node  during  this  phase.  If  the 
drain  node  is  of  input  strength,  then  this  is  modeled  as  a  primitive  3  with  type  "0"  if  x=0  and  type  "1* 


if  x=l.  In  either  case  the  delay  values  for  this  nonstandard  primitive  are  computed  with 
Ain=(k.j+1— kj)hmin  and  <T3=(RpC2)A,T)KP)  where  Rp  is  the  resistance  of  the  pass  transistor.  If  the  drain 
node  is  of  pullup  strength  then  let  RD  and  RL  denote  the  resistances  of  the  driver  and  load  transistors  in 
the  corresponding  inverter  and  let  Ct  be  the  capacitance  at  the  drain  node.  If  x=l,  we  model  this  as  a 
type  "0"  situation  in  primitive  5  and  compute  Aj,  and  crs  as  in  the  case  of  primitive  3,  shown  above.  In 
addition,  we  compute  0=Rd/Rp  and  y=Cj/C2.  If  x=0,  we  model  this  as  a  type  *l"  situation  in  primi¬ 
tive  5  and  compute  the  same  parameters  as  before,  except  that,  0=R(.ASsRp).  In  either  case  we  can 
alter  the  transition  times  kj  and  kJ+i  by  computing  the  delay  values  for  the  appropriate  nonstandard 
primitives.  We  now  consider  the  case  when  the  gate  node  of  the  pass  transistor  is  in  the  u*  state  in  the 
phase.  By  definition,  there  must  be  a  transition  (u,l,kj)  at  the  gate  node.  Let  the  transition  time  of  the 
previous  transition  at  the  gate  node  be  kif  where  k,<kj.  If  the  drain  node  is  of  input  strength  we 
model  this  as  a  primitive  2  with  type  "0"  if  x=l  and  type  *l"  if  x=0.  If  the  drain  node  is  of  pullup 
strength,  we  model  this  as  a  primitive  4  with  type  "0“  if  x=0  and  type  "l"  if  x=l.  In  all  these  situa¬ 
tions  we  compute  Ain=(kj— k,)hmm  and  the  other  parameters  as  in  the  previous  case  and  compute  the 
delay  values  for  the  appropriate  nonstandard  primitive. 

We  have  thus  defined  the  delay  operator  for  inverters  and  PTB’s  consisting  of  single  pass  transis¬ 
tors.  In  the  case  of  a  general  MFB,  we  describe  a  mapping  technique  that  maps  the  MFB  into  an 

equivalent  inverter  and  use  the  delay  operator  on  the  inverter.  In  the  case  of  a  general  PTB,  we 
describe  a  mapping  technique  based  on  the  use  of  the  Elmore  time  constant  [46],  which  maps  a  com¬ 
ponent  (or  a  supercomponent)  occurring  in  a  phase  during  the  simulation  of  the  PTB  into  an  equivalent 
single  pass  transistor  driving  some  equivalent  load  capacitance.  We  can  then  use  the  delay  operator, 
defined  above,  on  this  single  equivalent  pass  transistor. 

Consider  an  MFB  with  output  node  n^,  and  a  load  transistor  of  resistance  RL.  Suppose 

Cl=CAP(n0)  is  the  capacitance  at  the  output  node  of  the  MFB.  Now,  let  us  consider  the  case  when  a 

complete  pair  of  zero-delay  transitions  (0,t4kj) ,  (u,ljtj+1)  occurs  at  the  output  node.  We  then  map  the 


\1FB  into  an  equivalent  inverter  driving  the  capacitance  Ct,  with  load  transistor  having  a  resistance  RL 
and  a  driver  transistor  with  resistance  RL/8S,  where  Sj  is.  the  standard  inverter  ratio.  If 
( ,  (u,0,kJ+I)  is  the  sequence  of  transitions  at  the  input  node  of  the  equivalent  inverter,  then 
(0,ujtj) ,  (u,l,kj.M)  would  be  the  zero-delay  sequence  at  the  output  node  of  such  an  inverter.  Thus,  the 
two  configurations  are  zero-delay  equivalent.  We  assume  that  these  two  are  also  delay-equivalent  and 
obtain  new  transitions  k'j  and  k'j+i  by  using  the  delay  operator  on  the  inverter  and  treat  these  as  the 
new  transition  times  at  the  output  node  n^,  of  the  MFB.  Let  us  then  consider  the  other  case  when  the 
zero-delay  transitions  (l,ujtj) ,  (u,OJtJ+1)  occur  at  the  output  node  of  the  MFB.  In  this  case  we  first 
construct  a  network  of  resistances  with  a  resistance  of  value  =RES(mi)  between  the  drain  and  source 
nodes  of  a  driver  transistor  mt  if  its  gate  node  is  at  the  1  state  in  the  interval  (kj+ltkj+i  +  l).  Let  R*q 
denote  the  equivalent  resistance  between  n<,  and  ground  in  such  a  network.  Let  Ceq  denote  the  sum  of 
all  capacitances  at  the  internal  nodes  of  the  above  network  and  CL=Cj  +C,*,  denote  the  total  capaci¬ 
tance  obtained  by  lumping  all  the  internal  node  capacitances  on  the  output  node.  We  then  map  the 
MFB  into  an  equivalent  inverter  driving  a  net  capacitance  Q.  with  a  driver  transistor  of  resistance 
RD=Req  and  load  transistor  of  resistance  RL=SsReq.  The  sequence  at  the  input  node  of  the  equivalent 
inverter  would  then  be  (O.uJtj) ,  (u,l,kj+1).  We  have  two  zero-delay  equivalent  configurations  once 
again  and  we  define  the  delay  operator  on  the  MFB  to  be  the  delay  operator  on  the  equivalent  inverter. 
We  illustrate  the  mapping  technique  with  an  example  shown  in  Figure  5.6(a).  In  this  case,  the  zero- 
delay  sequence  at  the  output  node  nQ  is  ( l,u,100) ,  (u,0,200).  In  the  time  interval  (200,201),  we  see 
that  the  signals  at  the  gates  of  transistors  mi^n3^n4,  and  m$  are  each  in  the  1  state.  Hence,  we  obtain 
the  resistive  network  as  shown  in  Figure  5.6(b),  with  Ri=RES(mi),  and  compute  the  equivalent 
impedance  R*q.  The  equivalent  inverter,  shown  in  Figure  5.6(c),  consists  of  a  driver  with  resistance 
R,q,  a  load  with  resistance  The  signal  at  the  gate  terminal  of  the  driver  is  (0,u,100) ,  (u, 1,200) 

and  the  effective  load  capacitance  at  the  output  node  of  the  inverter  is  the  sum  of  the  node  capacitances 
at  nodes  n^,  nlt  and  n2  in  the  original  MFB  as  shown  in  Figure  5.6(c). 
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We  now  digress  a  little  to  discuss  the  implementation  of  an  algorithm  to  find  the  equivalent  con¬ 
ductance  between  two  terminals  a  and  b  in  a  network  of  conductances  (or  resistances).  We  can  treat 
such  networks  as  weighted  graphs  with  each  edge  having  a  weight  equal  to  the  corresponding  resistance 
and  hence  can  use  the  terminology  we  developed  for  graphs  for  networks  as  well.  Any  node  other 
than  a  or  b  in  the  network  is  an  internal  node.  Clearly,  any  set  of  parallel  conductances  can  be 
replaced  by  a  single  conductance  equal  to  the  sum  of  the  parallel  conductance.  We  define  this  process  as 
the  simplification  of  the  network.  Now  consider  an  internal  node  of  degree  2  in  the  network.  We  can 
eliminate  this  node  from  the  network  by  replacing  the  conductances  Gi  and  G2,  connected  to  it  by  a 
conductance  of  value  G1G2AG1+G2)  between  the  two  nodes  adjacent  to  it.  It  must  be  noted  that  elim¬ 
inating  such  a  node  does  not  change  the  equivalent  conductance  between  the  nodes  a  and  b  in  the  net¬ 
work.  We  nowr  extend  the  notion  of  eliminating  an  internal  node  to  nodes  of  degree  k.^2.  Let  no  be 
an  internal  node  of  degree  k  >  2  in  a  simplified  network  and  let  nj,n2, ....  nk  be  its  adjacent  nodes. 
Let  Gi  be  the  conductance  between  no  and  n,  for  each  i— 1,2, . . . ,  k.  We  then  define  the  elimination  of 
no  from  the  network  to  be  a  new  network  without  n«  with  a  conductance  Gy  between  each  pair  of 

k 

nodes  n;  and  n,  originally  adjacent  to  n^  such  that  Gij=GjGj/Gtot,  where  G,0,=£^>  is  the  sum  of  all 

i=l 

the  conductances  connected  to  no  in  the  old  network. 

Theorem  5.1  :  The  elimination  of  an  internal  node  from  a  simple  network  does  not  change  the 
equivalent  conductance  between  the  nodes  a  and  b  in  the  network. 

Proof  :  Let  no  be  an  internal  node  of  degree  k^2  and  let  n,^L2> _ , nk  be  its  adjacent  nodes.  Let  Ij 

denote  the  current  flowing  through  Gj  from  nj  to  no  in  the  network  for  each  i=l,2, . . .  ,k,  as  shown 
in  Figure  5.7(a).  If  for  each  i=l,2, ...  ,k  we  can  show  that  the  sum  of  the  currents  flowing  away 
from  ni  through  the  all  the  conductances  GSj  ,  j=l,2, . . . ,  k  j^i  in  the  new  network  is  equal  to  Ij, 
then  we  are  clearly  done  with  the  proof.  To  this  end,  suppose  V;  denotes  the  voltage  at  node  ns  for  each 

k 

i=0,l, . . . ,k.  Then  Ij=Gj(Vj— v0) and  Yl;  =  0.  Therefore, 
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Vn=- 


i=l 


where  Gw,=  ZG,  On  substituting  this  value  for  v0  in  the  previous  equation,  we  get  for  each 

i=l 

i ~  1,2, 


Zor> 

GiTi-Gi^i -  =  Ii 


which  on  simplification  gives 

ZGi/vj-Tj)  =  Ij. 

j*-‘ 

\ow  the  above  equation  is  valid  for  each  i=l,2, . . .  ,k  and,  furthermore,  its  left-hand  side  is  precisely 
the  total  current  leaving  through  the  conductances  Gy  ,  j=l,2, . . .  ,  k  ji^i.  The  network  obtained 
after  eliminating  n«  is  shown  in  Figure  5.7(b).  Hence  the  proof  is  completed.  □ 

Our  algorithm  to  compute  the  equivalent  conductance  Glb  between  two  terminals  a  and  b  in  a 
network  of  resistances  can  now  be  described  as  follows : 

1)  Simplify  the  network,  Le,  replace  all  conductances  in  parallel  by  a  single  conductance  equal  to 
the  sum  of  the  parallel  conductances. 

2)  Pick  an  internal  node  of  smallest  degree  in  the  existing  simple  network  and  eliminate  it  from  the 
network. 

3)  Simplify  the  resulting  network. 

4)  If  there  is  an  internal  node  in  the  existing  network,  then  go  to  step  2.  Otherwise,  set  G,b  to  be  the 
conductance  between  a  and  b  in  the  final  network  and  STOP. 


Notice  the  similarity  between  this  algorithm  and  Algorithm  4.2  used  to  compute  the  zero-delay 
sequences  at  the  output  nodes  of  an  MFB.  In  fact,  both  these  algorithms  can  be  run  in  parallel  on  the 


same  data  base  used  for  representing  graphs.  It  must  also  be  noted  that  the  above  algorithm  would  still 
work,  if  we  had  picked  any  internal  node  as  the  next  candidate  for  elimination  However,  we,  pick  the 
node  with  the  smallest  degree  for  the  same  reasons  as  explained  in  Algorithm  4.2.  This  completes  our 
discussion  on  the  implementation  of  the  algorithm  to  compute  the  equivalent  conductance  between  two 
terminals  in  a  network  of  resistances. 

We  now  describe  the  delay  operator  for  a  general  PTB.  We  begin  by  introducing  the  notion  of 
the  Elmore  time  constant  [46]  in  an  RC-tree.  A  graph  T  is  a  tree  if  it  is  connected  and  has  no  cycles.  In 
each  tree,  we  can  focus  our  attention  on  a  special  vertex  called  the  root  of  the  tree.  If  a  vertex  a  is  a 
root  of  a  tree  T,  then  T  is  said  to  be  rooted  at  a,  denoted  by  Ta.  In  any  tree,  there  is  a  unique  path  from 
the  root  to  any  other  vertex  in  the  tree  (in  fact,  there  is  a  unique  path  between  any  two  vertices  in  a 
tree).  We  say  that  a  network  composed  of  resistances  and  capacitances  forms  an  RC-tree  if  the  subnet¬ 
work  of  resistances,  when  viewed  as  a  weighted  graph,  forms  a  tree  and  there  is  a  capacitance  from 
each  node  of  the  network  to  ground.  Note  that  all  capacitors  in  such  a  network  are  grounded,  i.e,  there 
are  no  floating  capacitors.  Consider  an  RC-tree  rooted  at  node  no  and  let  nt^i2r ....  np  be  the  rest  of 
the  nodes.  Let  Q  denote  the  capacitance  from  node  n;  to  ground,  for  each  i=l,2, . . . ,  p.  Let  P*  denote 
the  unique  path  from  the  root  no  to  the  node  n;  and  let  P^PjOPj  denote  the  portion  of  the  path 
between  the  root  and  it;  that  is  common  to  that  between  the  root  and  nj.  Let  Rjj  denote  the  sum  of  all 
the  resistances  in  Pir  If  Pjj=0.  then  Rjj=0.  We  can  now  associate  a  time  constant  rv  known  as  Elmore 
time  constant  for  each  node  nj  in  the  RC-tree,  defined  as 

Ti  ~ 

j=l 

Without  loss  of  generality,  we  need  only  consider  rooted  trees  in  which  the  root  vertex  has  degree  1, 
since  if  the  root  vertex  has  degree  k  >  1,  then  we  can  split  this  vertex  and  obtain  k  subtrees,  each  rooted 
at  a  vertex  of  degree  1.  As  far  as  computing  Elmore  time  constants  is  concerned,  we  need  only  consider 
the  subtree  containing  the  node  for  which  the  time  constant  is  to  be  computed  since  the  node  capaci¬ 
tances  in  the  other  subtrees  have  no  effect  on  its  computation.  Let  us,  therefore,  consider  an  RC-tree 


rooted  at  node  no  and  let  R,  be  the  (unique)  resistance  connected  to  n©.  An  example  of  such  a  network 
is  shown  in  Figure  5.8.  Then  for  each  node  n;  we  define  an  Elmore  equivalent  capacitance  C^j  to  be 
the  ratio  of  the  Elmore  time  constant  t,  to  the  resistance  Rj,  Le„  Ce<),i=Ti/R,.  For  the  node  n,  in  the 
network  in  Figure  5.8,  the  values  for  the  Elmore  time  constant  and  equivalent  capacitance  are 

t1=R1(C1+C2+C3+C4+C5 +00+07) 


C„,.x=C1+C2+C3+C4+Cs+C6+C7 


while  for  node  n7  they  are 


t7=R1(C1+C2+C3+C4+C5+C6+C7)  +  R3(C3+C4 +C6  +C7)  +  R4(C4  +C7)  +  R7C7 

crq.7 =c,  +c2 +( 1  +  )c3 +<  1 + )c4  +q+(  1 + ~  x:b+(  1  +  h+**+*yc,. 

Kj  K]  R]  Rj 

Let  us  now  consider  a  phase  in  the  simulation  of  a  general  PTB.  Let  O  be  a  component  of  the 
graph  that  is  constructed  in  the  first  part  of  the  simulation  in  this  phase.  The  only  kinds  of  components 
on  which  we  will  be  using  the  delay  operator  are  those  containing  exactly  one  strongest  node,  and  that 
node  being  of  input  or  pullup  strength.  The  other  kinds  of  components  would  lead  to  conflicts  or 
charge  sharing.  Therefore,  let  O  be  a  component  with  the  strongest  node  □<,  and  let  nt,n2, ....  np  be 
the  rest  of  the  nodes  in  the  component.  We  then  construct  an  RC-network  from  O  by  replacing  each 
edge  by  a  resistance  equal  to  the  resistance  of  the  corresponding  pass  transistor  and  a  capacitance 
Cj^CAPin;)  from  each  node  ni  to  ground.  We  first  simplify  the  network  and  then  obtain  a  spanning 
RC-tree,  T,  from  the  network.  By  a  spanning  tree  of  a  graph,  we  mean  a  subgraph  which  is  a  tree  and 
includes  all  the  nodes  of  the  original  graph.  The  fact  that  every  connected  graph  has  a  spanning  tree  is 
a  standard  result  in  graph  theory,  the  proof  of  which  can  be  found  in  almost  any  textbook  on  the  sub¬ 
ject,  such  as  [50].  For  each  node  ns ,  i=l,2, . . . ,  p  we  compute  the  delays  for  a  complete  transition  in 
its  node  sequence  as  follows.  Let  R,  be  the  unique  resistance  connected  to  the  root  no  in  the  tree  T.  In 
case  the  degree  of  no  is  k  >  1,  we  then  split  the  node  no  and  consider  the  rooted  subtree  containing  nj. 
We  begin  by  computing  the  Elmore  equivalent  capacitance  Ceq>j  at  this  node,  which  involves  the  com¬ 
putation  of  the  Elmore  time  constant.  We  then  construct  an  equivalent  circuit  with  a  single  pass 


transistor  of  resistance  R,  with  drain  node  no  and  source  node  n,.  The  capacitance  at  node  no  is 
CAPCno)  itself,  while  the  capacitance  at  the  source  node  of  this  equivalent  pass  transistor  is  Ctq-i.  If  no 
is  of  input  strength,  then  this  is  a  nonstandard  primitive  3.  If  n©  is  of  pullup  strength,  then  we 
replace  the  corresponding  MFB  by  its  equivalent  inverter  and  treat  the  whole  configuration  as  a  non¬ 
standard  primitive  5.  We  then  obtain  the  new  transition  times  for  node  nj  by  applying  the  delay 
operator  on  the  equivalent  single  pass  transistor  configuration.  This  process  is  repeated  for  each  node  in 
the  component.  In  case  the  node  no  is  of  pullup  strength,  we  delay  the  transitions  in  its  sequence  by 
lumping  all  the  capacitances  in  the  RC-network  at  Do  and  reduce  the  resulting  configuration  to  a  non¬ 
standard  primitive  1,  using  the  mapping  technique  that  maps  an  MFB  into  an  equivalent  inverter. 

Let  us  now  consider  a  supercomponent  SC  in  the  second  part  of  the  phase  simulation  of  a  PTB. 
We  will  only  consider  the  situation  when  SC  has  only  one  strongest  component  and  that  such  a  com¬ 
ponent  has  only  one  strongest  node.  The  other  situations  lead  to  conflicts  or  charge  sharing  and  hence 
are  not  handled  by  the  delay  operator.  We  will,  first,  restrict  ourselves  to  the  case  when  SC  has  only 
one  edge,  say  e.  Let  On  and  Oj  be  the  two  components  joined  by  e  and  let  Rp  denote  the  resistance  of 
the  pass  transistor  corresponding  to  this  edge.  We  define  contraction  of  a  component  to  be  collapsing  all 
the  vertices  of  the  component  into  a  single  node  with  capacitance  equal  to  the  sum  of  all  the  node  capa¬ 
citances  in  the  component.  The  strength  of  this  node  is  the  strength  of  the  component.  Without  loss  of 
generality  let  us  assume  that  O0  is  the  stronger  component.  Hence,  we  will  be  interested  only  in 
obtaining  delay  values  for  transitions  at  nodes  in  component  Oj.  Let  nt  be  the  node  (drain  or  source)  of 
the  pass  transistor  corresponding  to  e  in  the  component  O,.  We  begin  by  obtaining  a  spanning  tree  Ti 
of  O,  that  is  rooted  at  ni.  Let  no  be  the  strongest  node  in  Oq.  We  contract  the  component  O0  into  a  sin¬ 
gle  node,  which  we  will  still  call  n®.  We  then  modify  the  tree  Tj  by  including  the  node  and  join¬ 
ing  it  to  nt  by  an  edge  e.  We  then  declare  the  root  of  the  new  tree  T  to  be  the  node  We  then  con¬ 
struct  an  RC-tree  rooted  at  no  by  replacing  each  edge  of  T  by  the  resistance  of  its  corresponding  pass 
transistor  and  a  capacitance  from  each  node  to  ground.  We  can  now  compute  the  Elmore  equivalent 
capacitance  for  each  node  in  this  RC-tree.  Then  for  each  node  in  Oi  we  consider  a  single  pass  transistor 
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w  i  ;h  drain  node  no  and  its  associated  capacitance  and  source  node  driving  the  Elmore  equivalent  capaci¬ 
tance  of  the  node  under  consideration.  This  then  corresponds  to  a  nonstandard  primitive  2  or  4  depend¬ 
ing  upon  whether  the  node  no  is  of  input  or  pullup  strength,  respectively. 

The  case  of  a  supercomponent  SC  having  more  than  one  edge  seldom  occurs  in  practice.  We  shall, 
however,  discuss  this  situation  too  for  the  sake  of  completeness.  We  begin  by  constructing  a  spanning 
tree  on  the  components  of  the  supercomponent  with  the  root  being  the  strongest  component,  say  Oq.  Let 
n,,  be  the  strongest  node  in  <v  Consider  an  edge  of  this  tree  ek  Joining  components  Oj  and  O r  Without 
loss  of  generality,  assume  that  O;  is  closer  to  the  root  than  Oj.  In  this  case  Oj  is  said  to  be  the  father 
component  and  Oj  is  the  son  component  of  ek,  respectively.  For  each  edge  ek,  then,  we  apply  the  delay 
operator  on  the  nodes  of  its  son  component  by  contracting  all  the  components  present  in  the  path  con¬ 
necting  its  father  component  to  the  root  into  a  single  node  no  and  treating  ek  as  joining  no  and  the  son 
component.  This  corresponds  to  the  situation  of  SC  having  only  one  edge  ek  and  so  we  can  now  use  the 
RC-tree  technique  described  in  the  previous  paragraph. 

We  have,  therefore,  described  the  delay  operator  which  could  be  used  to  alter  the  transition  times 
of  a  complete  pair  of  zero-delay  transitions  at  the  output  node  of  any  MFB  and  at  normal  and  pullup 
nodes  of  any  PTB.  There  are  mainly  two  steps  involved.  The  first  step  is  to  map  the  MFB  or  PTB  into 
a  nonstandard  primitive  and  the  second  step  is  to  use  time  scaling  to  compute  the  delay  values  in  non¬ 
standard  primitives  from  those  computed  for  standard  primitives.  Both  these  steps  could  cause  timing 
errors.  However,  as  we  shall  see  in  Chapter  7,  the  switch-level  timing  estimates  generated  by  this 
approach  are  fairly  accurate  in  a  variety  of  NMOS  circuits  considered. 


5.4  Filtering  Operation 

In  this  chapter,  thus  far,  we  have  described  a  delay  operator  which  alters  the  transition  times  in  a 
pair  of  complete  transitions.  Thus,  if  a  sequence  consists  of  only  a  pair  of  complete  transitions,  then  we 
can  use  the  delay  operator  directly  on  this  sequence.  In  this  sequence  we  will  consider  the  effect  of 


delaying  a  pair  of  complete  transitions  on  the  subsequent  terms  of  the  sequence.  As  an  example,  con¬ 
sider  an  inverter,  with  the  following  zero-delay  sequence  computed  at  its  output  node. 

So=(0,t4ki) ,  (u,1Jl2)  ,  (l,u4t3) ,  (u,0,k4). 

This  sequence  is  the  result  of  a  compatible  and  chronological  input  sequence,  and  is,  therefore,  also  com¬ 
patible  and  chronological.  Let  us  first  apply  the  delay  operator  to  the  first  pair  of  transitions  and  com¬ 
pute  the  new  transition  times  k.'t  and  k'2.  By  definition,  k'i  <k'2.  If  k'2<k3  then  we  simply  apply  the 
delay  operator  to  the  second  pair  also  and  compute  the  resulting  delayed  sequence  to  be  : 

So=(0,u4t',) ,  (u,ljc2) ,  (l,i4k‘3) ,  (u,OJc4). 

This  delayed  sequence  is  compatible  and  chronological.  If,  however,  k’1<k3<k'2,  this  means  that  at 
the  time  the  driver  transistor  of  the  inverter  starts  to  turn  0\\  the  output  node  is  still  in  the  u  state 
and  so  the  (u,l)-type  transition  cannot  occur  at  the  output.  Hence  we  simply  compute  the  delayed  out¬ 
put  sequence  in  this  case  to  be  : 

So=(0,uJc  i) ,  (u,OJt4) 

which  is  a  partial  pair  of  transitions  that  would  represent  a  glitch  at  the  output  node.  Furthermore,  if 
k3  <k',,  then  there  cannot  be  any  transitions  taking  place  at  the  output  and  so  the  output  remains  in 
the  0  state  for  all  time  which  is  represented  by  the  sequence  : 

So=(u,0,-l). 

What  we  have  described  above  is  the  example  of  the  filtering  operator,  which  takes  the  zero-delay 
sequence  as  its  input  sequence  and  using  the  delay  operator  computes  an  output  sequence  that  provides  a 
better  representation  of  the  ternary  equivalent  of  the  analog  waveform  at  the  node  under  considera¬ 
tion. 

We  now  describe  the  filtering  operation  in  general.  Consider  any  sequence  S  of  transitions.  We 
mark  a  term  of  S  as  "delayed"  if  the  delay  operator  has  been  used  previously  on  this  term,  otherwise, 
we  mark  it  "undelaved."  The  subsequence  of  S  consisting  of  all  its  terms  marked  "delayed"  is  called  the 
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delayed  part  of  S.  The  rest  of  the  sequence  is  the  undelayed  part.  Thus,  we  can  consider  any  sequence 
of  transitions  to  be  the  catenation  of  its  delayed  part  and  its  undelayed  part.  Let  us  consider  S  as  an 
input  sequence  to  the  filtering  operator.  The  output  of  the  filtering  operation  will  then  be  a  sequence  S 
which  is  computed  as  described  below.  First,  the  filtering  operator  replaces  any  partial  pair 
(x,ujt.i) ,  (u*xjti+i)  of  transitions  in  the  undelayed  part  of  S  by  two  complete  pairs 
(x,u4tj)  ,  (u,-'Xjki+l)  ,  l)  ,  (u^t,ki+1).  This  is  done  by  procedure  COMPLETE  (S)  used 

below.  We  will  also  make  use  of  the  procedure  WINDOW  (  S>k„kb)  that  returns  those  transitions  in  S 
occurring  between  ka  and  k„.  The  algorithm  that  performs  the  filtering  is  given  below. 


Algorithm  5.1 


procedure  FILTER  (S) 
begin  _ 

S-0: 

S —COMPLETE  (S>, 

while  there  is  a  transition  in  S  marked  "undelayed"  do 

begin 

(x.udt;)—  first  transition  marked  "undelayed"  in  S; 

(u,->xjki+1)«-  next  transition  marked  "undelayed"  in  S; 
k'jjt'j+i— DELAY  (kj,ki+,>, 
mark  (x,i4kj)  as  "delayed"  in  S; 
mark  (u,-’xjti+i)  as  "delayed"  in  S; 

S- WINDOW  (S.Ojkjh 
y«-  final  value  of  S; 
if  (y=x)  then 

append  (x,ujc.'i) ,  (uoxdt'i+l)  S; 
else  if  (y=u)  then 

append  (u,-o^k1+I)  S; 

end  if 

end 

return  S; 

end 

The  sequence  of  transitions  S  obtained  after  filtering  can  easily  be  verified  to  be  compatible  and 


chronological. 
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CHAPTER  6 


SIMULATING  STRONGLY  CONNECTED  COMPONENTS 


In  this  chapter  we  discuss  the  use  of  a  special  windowing  technique  to  simulate  the  MFB’s  and 
PTB’  within  a  strongly  connected  component  (SCC).  The  algorithm  presented  splits  the  entire  time 
interval  of  interest  [0,K]  into  various  time  slots  or  windows  such  that  all  pairs  of  signal  transitions 
(both  partial  and  complete)  take  place  entirely  within  one  of  these  windows.  This  is  achieved  by  main¬ 
taining  a  sequential  list  of  intervals  of  transitions  which  is  updated  dynamically  as  the  algorithm 
progresses.  The  algorithm  is,  in  a  sense,  event-driven,  since  only  those  circuit  blocks  that  are  active 
within  a  window  are  processed  and  the  fanouts  of  the  output  nodes  of  these  blocks  are  scheduled  for 
processing  in  the  future.  We  begin  by  reviewing  two  well-known  and  classical  techniques,  namely,  the 
waveform  relaxation  method  and  the  time-point  relaxation  method,  that  could  be  used  to  simulate  the 
blocks  in  the  network.  We  will  show  that  neither  of  these  schemes  are  entirely  suitable  in  our  type  of 
simulation  and  hence  there  is  a  need  for  the  event-driven  windowing  technique  that  we  will  present. 


6.1  Waveform  Relaxation  Versus  Time-point  Relaxation 

Let  ft(N,M,E)  be  a  partitioned  NMOS  network  in  which  the  set  of  blocks  £  is  further  parti¬ 
tioned  into  its  strongly  connected  components  £,,E2 . £M-  Let  [0,K]  denote  the  time  interval  of 

simulation.  Suppose  the  SCC  £,  is  currently  scheduled  for  processing.  If  £;  is  a  simple  SCC  then  the 
single  block  contained  in  it  could  be  simulated  during  [0,K]  by  the  algorithms  discussed  in  the  previous 
chapters.  Hence,  suppose  that  Ej={Qi,fi2,  •  •  • «  flpK  where  p^2  and  each  flj  is  either  an  MFB  or  a 
PTB. 


The  blocks  within  Ej  could  then  be  simulated  using  a  waveform  relaxation  iterative  scheme 
WR_SI\1  described  below.  Let  R,  be  an  ordering  on  the  blocks  of  E,.  Without  loss  of  generality  we 
can  assume  that  the  blocks  of  Ej  are  placed  according  to  Rit  i.e„  R;(ftj)=j  for  each  j=l,2, . . .  ,  p.  For 
any  node  nk€N  in  the  network  let  Sk  denote  the  most  recently  computed  sequence  of  transitions  or  the 
present  sequence  at  the  node  and  let  Sk  denote  the  previously  computed  sequence  or  the  past  sequence 
at  the  node.  Also,  let  sk  €{0,1}  denote  the  initial  state  at  node  nk,  which  is  either  provided  by  the  user, 
or  is  arbitrarily  set  to  0.  Let  N;  denote  the  list  of  all  the  circuit  nodes  contained  in  the  blocks  within 
E,.  The  algorithm  begins  by  setting  the  present  sequence  of  transitions  at  any  node  that  has  not  been 
previously  computed  to  a  constant  sequence  corresponding  to  the  initial  condition  at  that  node  for  all 
time  [OJCj  The  iterative  procedure  begins  by  setting  the  past  sequence  equal  to  the  present  sequence  for 
each  circuit  node  in  the  SCC.  The  individual  blocks  within  the  SCC  are  then  simulated  according  to 
the  ordering  R,  over  the  entire  time  interval  [0,K]  by  algorithms  described  in  the  previous  chapters.  In 
each  case  the  present  sequences  at  the  input  nodes  of  a  block  are  taken  as  the  input  sequences  for  simu¬ 
lation  and  the  present  sequences  at  the  output  nodes  of  the  block  are  updated  after  the  simulation.  The 
procedure  EQUAL  then  checks  for  equality  between  the  present  sequence  and  the  past  sequence  during 
the  tune  interval  [OJC]  at  each  node  and  returns  the  value  0  if  they  are  found  equal  and  1  if  not.  Here, 
two  sequences  are  considered  equal  if  they  have  the  same  number  of  terms  and  are  both  type-equal  as 
well  as  time-equal  as  defined  in  Section  4.1  in  Chapter  4.  The  iterations  are  carried  out  until  both 
present  and  past  sequences  are  found  equal  for  each  node  in  the  SCC. 

Algorithm  6.1 

Input  :  A  strongly-connected  component  Ej  and  an  ordering  R,, 

such  that  the  blocks  within  Ej  are  arranged  according  to  Rj. 

Output :  Sequences  of  transitions  at  output  nodes  of  each  block  within  Ej. 


procedure  WR_SI\1  (Ej.RjAK) 


for  each  node  nk  GNj  do 

begin 

if  (Sk=0)  then 

Sk*-(u^k,-l); 

end  if 

end 

repeat 

for  each^node  nk  GNj  do 
Sn,-Sk; 

for  j*-l  until  p  do 

if  flj  is  an  MFB  then 

MFB_SIM  (QjAK) 
else  if  fl  j  is  a  PTB  then 

PTB_SIM  (Oj,OJC) 

end  if 

ind«-0; 

for  each  node  nkGNj  do 
begin 

ind- EQUAL  (Sk,Sk,0,K.); 

end 

until  ind=0 

end 

We  now  discuss  several  features  of  the  above  algorithm.  We  first  consider  obtaining  an  a  priori 
ordering  Rj  on  the  blocks  of  the  SCG  Given  any  such  ordering,  we  define  a  node  to  be  initially  relaxed 
if  it  is  an  input  node  of  a  block  within  the  SCC  and  its  present  sequence  has  not  yet  been  updated  in 
the  current  iteration  at  the  time  of  simulating  the  block.  In  the  above  algorithm,  the  present  sequence 
of  a  node  gets  updated  only  after  simulating  the  block  to  which  it  is  an  output  node.  Hence,  in  the  case 
of  an  initially  relaxed  node  the  blocks  in  its  fanin  list  are  ordered  after  the  blocks  in  its  fanout  list. 
Given  an  ordering  on  the  vertices  of  a  digraph,  we  say  that  an  arc  is  a  forward  arc  if  its  tail  vertex 
appears  before  its  head  vertex  in  the  ordering  ;  otherwise,  the  arc  is  said  to  be  a  feedback  arc.  If  we 
consider  the  vertices  of  the  derived  digraph,  as  defined  in  Chapter  3,  corresponding  to  the  blocks  within 
the  SCC  Eit  then  any  ordering  Rj  would  result  in  a  set  of  feedback  arcs.  Furthermore,  the  number  of 
feedback  arcs  produced  by  R;  is  an  upper  bound  on  the  number  of  initially  relaxed  nodes  due  to  Rj. 
Clearly,  the  best  choice  for  R;  is  one  which  results  in  the  least  number  of  initially  relaxed  nodes  since 
this  would  speed  up  the  convergence  of  the  above  algorithm.  However,  this  corresponds  to  finding  an 
ordering  that  results  in  the  minimum  number  of  feedback  arcs,  which  is  an  NP-Complete  problem 


[52.53,57].  Therefore,  the  choice  of  the  a  priori  ordering  Rj  affects  the  speed  of  convergence  of  the 
above  algorithm  and  finding  the  best  ordering,  in  this  respect,  turns  out  to  be  a  difficult  problem  from 
the  computational  complexity  point  of  view.  This  is  one  of  the  drawbacks  of  the  waveform  relaxation 
scheme. 

Another  aspect  that  needs  to  be  considered  is  that  the  number  of  iterations  turns  out  to  be  propor¬ 
tional  to  the  number  of  transitions  at  the  various  circuit  nodes  in  certain  circuits  such  as  the  ring  oscil¬ 
lator.  This  is  also  one  of  the  major  drawbacks  in  the  waveform  relaxation  method  WRM  [9].  Finally, 
this  scheme  requires  storing  two  sequences  of  transitions  for  the  entire  time  interval  [0,K]  at  each  node 
which  could  be  a  considerable  amount  of  computer  storage  for  large  SCCs.  In  spite  of  all  these  draw¬ 
backs,  this  scheme  could  still  be  used  in  our  type  of  switch-level  simulation  since  it  is  easy  to  imple¬ 
ment  and  is  compatible  with  the  delay  and  filtering  operations.  In  Appendix  II,  we  will  discuss  the 
problem  of  finding  an  optimum  ordering  that  results  in  the  minimum  number  of  feedback  arcs  in  a 
digraph.  We  also  discuss  an  algorithm,  proposed  by  Younger  [60],  that  finds  such  an  ordering  in  case  of  a 
general  digraph.  This  would  then  be  the  a  priori  ordering  R,  used  in  Algorithm  6.1. 

An  alternative  approach  is  to  use  the  time-point  relaxation  method  for  the  simulation  of  the 
entire  partitioned  network  Q(N,M,E).  In  this  approach  there  is  no  need  to  handle  blocks  within  an 
SCC  in  a  special  way  since  the  scheme  is  event-driven,  as  discussed  in  Section  2.3.1.  and  is  used  in 
several  digital  simulators  [13,17,19,25,26].  In  order  to  use  this  approach  in  our  type  of  simulation,  we 
could  define  an  event  as  a  transition  (x^yjii)  occurring  at  time  kj.  A  time  queue  (TQ)  is  used  to  main¬ 
tain  a  list  of  events  occurring  at  different  instants  of  time.  If  an  event  (x.ydc.j)  occurs  at  some  node  nj 
in  the  network,  then  all  the  blocks  in  the  fanout  list  of  nj  are  processed  at  time  kj.  If  on  processing  a 
block  at  kit  a  transition  is  observed  at  an  output  node  of  the  block,  then  this  is  defined  as  a  new  event, 
and  is  scheduled  to  occur  at  time  kj  >  k,.  Thus,  kj— kj>0  is  a  positive  delay  in  propagating  an  event 
occurring  at  an  input  node  of  a  block  to  an  output  node  of  the  block.  It  is  this  feature  that  makes  the 
use  of  time-point  relaxation  particularly  attractive  for  processing  blocks  within  feedback  loops. 


In  our  type  of  switch-level  simulation,  the  emphasis  is  on  generating  accurate  timing  estimates 
which  is  possible  by  using  the  delay  and  filtering  operations  described  in  Chapter  5.  However,  the 
delay  operator  can  only  operate  on  a  pair  of  complete  transitions  and  therefore,  events  can  be  pro¬ 
pagated  through  a  block  only  in  pairs.  Consider  an  example  of  an  inverter  with  a  sequence 
(0,u4tj) ,  (u,ldc2)  at  its  input  node  causing  a  sequence  (I.uJl'j)  ,  (u,0,k'2)  at  its  output  node.  In  order  to 
use  the  time-point  relaxation  scheme,  we  would  have  to  be  able  to  compute  the  value  of  k'j  only  with 
the  knowledge  of  the  input  event  (0,ujc3).  This  is  however  impossible,  since  the  delay  operator  needs 
to  know  the  values  of  both  and  k2  before  it  can  compute  k‘j  and  k‘2.  Furthermore,  it  is  possible  to 
have  k‘2<k2,  which  means  that  the  input  event  (u,l«k2)  causes  the  output  event  (u,0,k'2)  at  an  earlier 
time,  thus  violating  the  basic  assumption  that  one  only  advances  in  time  in  the  TQ  and  never  has  to 
backtrack.  Therefore,  the  time-point  relaxation  method,  as  such,  is  not  suitable  for  our  type  of  simula¬ 
tion. 

6.2  Event-driven  Dynamic  Windowing  Algorithm 

In  the  previous  section  we  discussed  two  relaxation  methods  to  simulate  the  blocks  in  a  network. 
The  first  method,  namely,  the  waveform  relaxation  method,  could  be  used  in  our  type  of  simulation 
since  it  is  compatible  with  the  delay  and  filtering  operations,  but  suffers  from  several  drawbacks  in  the 
case  of  blocks  within  a  strongly  connected  components.  The  second  method,  namely,  the  time-point 
relaxation  method,  is  used  in  several  digital  simulators,  mainly  because  blocks  within  strongly  con¬ 
nected  components  do  not  pose  any  special  problems,  but  it  is  found  to  be  incompatible  with  the  delay 
and  filtering  operations,  and  hence,  cannot  be  used,  as  such,  in  our  type  of  simulation.  In  this  section  we 
describe  a  new  scheme  to  handle  blocks  within  a  SCC  which  overcomes  most  of  the  above  drawbacks  in 
the  waveform  relaxation  method  by  incorporating  some  of  the  ideas  of  the  time-point  relaxation 
method.  The  main  idea  is  to  use  the  so-called  windowing  technique  in  the  waveform  relaxation  pro¬ 
cedure,  as  suggested  in  [11,12],  wherein  it  is  shown  that  the  number  of  iterations  is  exponentially  pro- 


portional  to  the  size  of  the  time  interval  of  analysis.  This  suggests  dividing  the  entire  time  interval  of 
interest  into  many  time  slots  or  windows  so  that  waveform  relaxation  can  be  performed  within  each 
window.  These  waveforms  generate  initial  conditions  for  the  next  window  and  so  on.  If  all  the  win¬ 
dows  have  the  same  size,  then  there  exist  an  optimum  number  of  windows  which  minimize  the  total 
number  of  iterations  (and  hence  the  total  CPU  time  for  analysis)  as  shown  in  [ll). 

The  choice  of  windows,  however,  is  very  crucial  in  our  type  of  switch-level  simulation  since  the 
initial  states  at  each  node  for  each  window  must  be  the  steady  states  0  or  1  in  order  to  obtain  good  tim¬ 
ing  through  the  delay  operator,  and  to  perform  the  filtering  operation  successfully.  This  appears  to  be  a 
no-win  situation  since  deciding  on  the  placement  of  windows  seems  to  require  a  prior  knowledge  of  the 
digital  waveform  (or  sequences  of  transitions)  at  each  circuit  node  within  the  SCC.  Here  we  describe  a 
successful  solution  to  this  problem  by  using  a  sequential  list  of  time  intervals  which  is  dynamically 
updated  as  the  algorithm  progresses.  In  addition,  the  new  scheme  is  event-driven,  and  therefore 
requires  no  a  priori  ordering  of  blocks  within  a  SCC.  Before  going  into  the  description  of  the  algo¬ 
rithm,  a  few  definitions  and  notations  are  needed. 

Consider  an  SCC  Ej  consisting  of  a  set  of  blocks  tly,Ci2, ....  £lr  Let  EXT;  denote  those  circuit 
nodes  in  the  blocks  within  the  SCC  for  which  the  node  sequences  have  already  been  computed.  For 
each  circuit  node  nk  in  the  SCC,  let  FO(nk)=FOUT(n|()nEi  denote  the  set  of  blocks  within  E;  for 
which  nk  is  an  input  node. 

Definition  :  A  transition  interval  for  a  node  is  the  time  interval  during  which  the  node  is  in  the  inter¬ 
mediate  state  u.  Associated  with  each  transition  interval  I  for  a  node  nk  is  a  fanout  list  of  blocks, 
denoted  by  F(I),  which  is  initially  set  to  F(Xnk).  Let  a(I)  and  b(I)  denote  the  initial  and  final  times  of 
the  transition  interval  L 

Let  It  and  I2  be  any  two  transition  intervals.  We  say  that  I,  <I2  if  and  only  if  b(Ii)<a(I2).  If 
I|  HI25*0,  then  we  say  that  It  and  I2  are  incomparable.  We  thus  have  introduced  the  notion  of  a  par¬ 
tial  order  "<"  on  a  set  of  intervals.  Let  L={IlfI2, . . . ,  Lj  be  a  sequential  list  of  intervals.  We  say  that 


L  is  an  ordered  list  if  I3  <I2<  •  •  •  <1^.  We  say  that  an  interval  I  is  contained  in  L  if  I£Ij  for  some 
Ij€L.  Given  any  interval  I  and  an  ordered  list  of  intervals  L  the  following  procedure  returns  an 
updated  ordered  list  L  containing  the  interval  L 


Input  :  An  ordered  list  L={IitI2, . . .  fIq}  of  intervals, 
and  a  new  interval  L 

Output :  A  new  ordered  list  L  containing  L 


procedure  INCLUDE(IJL) 
begin  „ 

L«-0; 

for  j«-  1  until  q  do 
begin 


end 

return  L; 


if  Ij<I  then 
tj— 1; 

L-LUlh 

else  if  inij^0  then 

FtI)«-F(I)  U  Ftlj); 
else  if  I<Ij  then 

if  71=1  then 

L-Lljfc 

end  if 

V*~Q; 

L-Lljlj; 

end  if 


end 


The  algorithm  for  the  new  dynamic  windowing  technique  can  now  be  described  as  follows.  The 
ordered  set  L  is  initialized  to  the  empty  set.  Every  transition  interval  at  each  node  in  EXT,  is  included 
in  L.  The  set  L  is  altered  dynamically  as  the  algorithm  progresses.  At  any  stage,  we  have  a  partition  of 
the  entire  time  interval  [0,K]  into  windows  by  taking  the  final  times  of  the  disjoint  intervals  in  L  as 
the  boundaries  of  the  windows.  The  set  L  plays  the  role  of  the  time  queue  (TQ)  used  in  the  time-point 
relaxation  method.  Here  events  take  place  over  transition  intervals  rather  than  occurring  instantane¬ 
ously.  If  a  transition  interval  at  an  input  node  of  a  block  causes  a  transition  interval  at  an  output  node 


of  the  block,  then  the  end  points  of  the  new  interval  can  be  computed  by  our  delay  operator.  Thus, 
this  new  scheme  is  compatible  with  our  delay  and  filtering  operations. 


Algorithm  6.2 


procedure  WIN_SIM  (Ej) 
begin 

L-0; 


for  each  circuit  node  nk€EXTj  do 
begin 

for  each  transition  interval  Ij  of  nk  do 
begin 

L*- INCLUDE  (IjJJ 

end 

end 


K2«-0; 

while  L  is  not  empty  do 
begin 

I«-  first  interval  in  I4 

K,-K2; 

while  Ft!)  is  not  empty  do 
begin 

K2«-b(I); 

first  block  in  P(I); 
for  each  output  node  nk  of  Qr  do 
Sk  «- WINDOW(SkJC,  Jt2>, 
if  flr  is  an  MFB  then 

MFB_SIM(ftI4:I4C2) 
else  if  Clr  is  a  PTB  then 

PTB_SL\1  (QpJCiJC.) 

end  if 

for  each  output  node  nk  of  flr  do 
begin 

S  k  •- WINDOW(Sk^C,4C2>, 
for  each  transition  interval  I, 
begin 

if  I  <Im  then 

L^INCLUDE(Im4-); 

else  if  Sk  ?eSk  then 

L-INCLUDE(Im,L>. 

end  if 

end 

end 

delete  the  first  block  from  Ftl); 

end 

delete  the  first  interval  from  L; 


of  nk  do 


The  above  algorithm  to  process  the  blocks  within  an  SCC  begins  by  forming  L  by  including  each 
transition  interval  of  each  circuit  node  in  EXT;.  The  first  interval  in  L  is  chosen  as  the  window  of 
interest.  The  blocks  in  its  fanout  list,  which  are  MFB’s  and  PTB’s,  are  then  only  for  the  duration  of  the 
present  window  until  the  list  is  empty.  Each  time  a  block  gets  processed,  a  transition  interval  in  an 
output  node  is  included  in  L  if  and  only  if  one  of  the  following  two  conditions  are  satisfied: 

a)  All  transitions  in  the  output  node  occur  after  the  present  window,  L 

b)  The  transitions  at  the  output  node,  occurring  during  the  present  window  after  processing  the 

block,  are  different  from  those  before  processing  the  block. 

After  the  fanout  list  for  the  present  window  is  empty  the  interval  is  deleted  from  L  and  the 
whole  process  is  repeated  until  L  is  empty. 

Consider  the  execution  of  the  above  algorithm  on  an  SCC  £j.  After  the  initialization  of  L  by 
including  the  transition  intervals  of  the  nodes  in  EXTj,  it  could  get  updated  by  the  inclusion  of  the 
transition  intervals  at  the  output  nodes  of  the  block  that  has  been  just  simulated.  This  could  alter 
either  the  endpoint,  K2,  of  the  present  window,  or  could  append  a  set  of  blocks  to  the  existing  fanout 
list  F(I)  of  the  present  window.  If  the  latter  situation  continues,  it  is  possible  that  a  block  could  reap¬ 
pear  in  the  fanout  list  of  the  present  window,  after  it  has  been  deleted  before,  and  is  hence  resimulated 
during  the  present  window.  We  say  that  an  SCC  is  well-behaved  if,  during  the  execution  of  Algorithm 
6.2,  none  of  its  blocks  is  ever  resimulated  during  the  same  window. 

Thus,  in  a  well-behaved  SCC  the  delay  characteristics  of  the  various  blocks  are  such  that  one  does 
not  have  to  perform  any  iterations  at  all.  If,  however,  the  SCC  is  not  well-behaved,  then  the  algorithm 
extends  the  fanout  list  of  the  present  window  and  resimulates  the  active  blocks  until  convergence  is 
achieved  for  the  duration  of  the  present  window.  This  is  equivalent  to  performing  waveform  relaxa¬ 
tion  iterations  within  the  present  window.  It  is  possible  to  conjure  up  an  SCC  for  which  the  initial 
window  gets  continually  extended  until  it  becomes  the  entire  time  interval.  In  this  case  using  the 
above  Algorithm  6.2  becomes  equivalent  to  Algorithm  6.1.  However,  such  a  situation  is  of  theoretical 


interest  only,  and  probably  never  occurs  in  practical  circuits.  Thus,  in  the  worst  case,  the  new  dynamic 
windowing  technique  performs  at  least  as  well  as  the  waveform  relaxation  method.  In  fact,  the  SCCs 
in  several  practical  circuits  considered  were  all  well-behaved,  in  which  case  Algorithm  6.2  performs 
much  better  than  Algorithm  6.1  in  all  respects.  To  begin  with,  there  is  no  need  to  place  the  blocks  of 
the  SCC  in  any  particular  order,  since  the  procedure  in  Algorithm  6.2  is  event-driven,  he^  only  those 
blocks  that  are  active  in  a  window  are  processed  during  that  window.  Secondly,  no  iterations  are  per¬ 
formed  in  case  of  a  well-behaved  SCC,  thereby  saving  considerable  amounts  of  computation  time. 
Finally,  the  active  blocks  are  processed  only  during  a  window  (and  not  for  the  entire  time  interval), 
thus  causing  a  reduction  in  both  computation  time  and  memory  space  required  to  store  the  sequences  of 


transitions. 


6 


CHAPTER  7 


MOSTIM  :  IMPLEMENTATION  AND  PERFORMANCE 


The  algorithms  described  in  Chapters  3  to  6  have  been  implemented  in  a  computer  program  called 
MOSTIM,  a  switch-level  timing  simulator  for  NMOS  circuits.  MOSTIM  is  written  in  FORTRAN  and 
runs  on  a  VAX  11/780  computer  with  the  UNIX  operating  system-  It  has  about  9600  lines  of  FOR¬ 
TRAN  code  which  includes  about  5800  lines  from  the  front  end  of  SPICE2G.1.  The  main  flow  chart 
for  MOSTIM  is  shown  in  Figure  7.1.  The  NMOS  network  is  described  to  MOSTIM  in  the  same  input 
description  language  as  SPICE2  [l].  The  three  overlays  MAIN,  READIN,  and  ERRCHK,  borrowed  from 
SPICE2G.1,  read  in  the  input  file  describing  the  network  and  establish  the  data  base  to  store  the  neces¬ 
sary  information  about  the  circuit  elements,  their  model  parameters,  and  interconnection,  etc.  A 
dynamic  memory  manager  is  used  to  allocate  space  for  each  element-  The  input  description  language 
allows  the  use  of  a  multilevel  hierarchy  of  subcircuits,  which  is  flattened  out  in  the  ERRCHK  overlay. 
This  overlay  al?  j  checks  for  topological  errors,  such  as  a  node  connected  to  less  than  two  circuit  ele¬ 
ments  and  a  loop  of  voltage  sources  as  well  as  errors  in  the  specifications  of  the  model  parameters  for 
the  circuit  elements.  The  subroutine  PARTITION  then  partitions  the  NMOS  network  into  MFB’s, 
PTB’s,  and  SRCs,  using  algorithms  described  in  Chapter  3  of  this  thesis.  The  set  of  blocks  in  the  parti¬ 
tioned  network  is  then  further  partitioned  into  strongly  connected  components  (SCCs)  and  these  are 
ordered  by  subroutine  ORDER.  The  subroutine  SIMULATION  processes  the  SCCs  in  the  above  order¬ 
ing.  If  an  SCC  is  simple,  then  the  appropriate  subroutine  SRC_SIM,  MFB_SIM,  or  PTB_SIM,  described 
in  Chapter  4,  is  used  to  simulate  the  block  for  the  entire  time  interval  of  interest.  If  an  SCC  contains 
more  than  one  block,  then  it  is  simulated  by  subroutine  WLN  SEVl,  which,  in  turn,  uses  subroutines 
MFB_SIM  and  PTB_SIM  to  simulate  the  individual  MFB’s  and  PTB’s  over  windows  in  time,  as 
described  in  Chapter  6.  The  subroutines  MFB_SIM  and  PTB_SIM  interact  dynamically  with 


subroutines  DELAY  and  FILTER,  described  in  Chapter  5,  to  alter  the  transition  times  of  the  zero-delay 
sequences  produced  and  filter  the  resulting  delayed  sequences.  Extensive  use  of  linked  lists  is  made 
throughout  the  program.  These  linked  lists  are  implemented  in  FORTRAN  with  the  help  of  one- 
dimensional  arrays. 

We  now  evaluate  the  performance  of  MOSTIM  based  on  its  computational  speed  (complexity)  and 
the  accuracy  of  its  switch-level  timing  (SLT)  estimates.  We  first  evaluate  the  computational  speed  by 
considering  several  examples.  The  first  example  is  a  combinatorial  NAND  gate  implementation  of  a 
one-bit  full-adder  circuit,  shown  in  Figure  7.2,  which  was  cascaded  to  produce  full-adders  from  one  to 
four  bits.  Table  7.1  shows  the  rate  of  growth  of  CPL'-time  versus  the  number  of  transistors.  The  total 
CPU-time  taken  by  MOSTIM  includes  the  time  taken  for  partitioning  and  ordering,  and  also  the  time 
for  the  switch-level  simulation,  the  delay  and  filtering  operations.  The  total  job  times  taken  by  SLATE 
[3]  and  SPICE2G.1  [l]  are  also  provided  for  comparison. 


Table  7.1  :  The  growth-rate  of  CPU-time  of  MOSTIM,  SLATE,  and  SP1CE2G.1 


Adder 

Bits 


Number  of 
Transistors 

CPU  -  Seconds 

MOSTIM  |  SLATE  |  SPICE2G.1 

33 

1.40 

61.1 

184.0 

66 

2.03 

133.2 

371.1 

99 

2-55 

195.8 

556.3 

132 

3.45 

252.9 

767.0 

E 


will  represent  analog  waveforms  produced  by  SPICE2G.1  with  solid  lines  and  the  ternary  digital 
waveforms  produced  by  MOSTIM  with  dotted  lines.  The  waveforms  at  the  output  of  every  fifth 
inverter  in  a  50-inverter  chain  produced  by  both  MOSTIM  and  SPICE2G.1  are  shown  in  Figure  7.3(b). 
Table  7.2,  below,  gives  the  CPU-times  taken  by  both  MOSTIM  and  SPICE2G.1  for  a  chain  of  identical 
inverters.  These  values  are  plotted  against  the  number  of  inverters  in  the  chain  in  Figures  7.3(c)  and 
7.3(d). 


Table  7.2  :  CPU-times  taken  by  MOSTIM  and  SPICE2G.1  on  a  chain  of  inverters 


From  both  of  the  examples  considered  above,  it  can  be  concluded  that  the  CPU-time  taken  by  MOSTIM 
grows  linearly  with  circuit  size  and  is  around  two  orders  of  magnitude  faster  than  SPICE2G.1. 


We  now  consider  several  examples  of  NMOS  circuits  simulated  using  MOSTIM.  A  one-bit  full- 
adder  circuit  with  pass  transistors  used  to  realize  part  of  the  logic  is  shown  in  Figure  7.4(a)  and  a  cas¬ 
caded  two-bit  adder  in  Figure  7.5(a).  The  input  and  output  waveforms  in  both  these  circuits  are  shown 
in  Figures  7.4(b)  and  7.5(b),  respectively.  The  presence  of  a  partial  pair  of  transitions  in  a  ternary  digi¬ 
tal  waveform  indicates  the  presence  of  a  glitch  in  the  corresponding  analog  waveform.  We  classify  a 
glitch  as  a  major  glitch  or  a  minor  glitch  according  to  whether  or  not  the  glitch  crosses  a  threshold 


limit.  MQSTIM  indicates  only  major  glitches  in  the  plots  of  its  waveforms.  However,  every  glitch, 
ma  jor  or  minor,  is  flagged  and  prmted  out  in  a  separate  diagnostic  file  for  each  circuit  if  it  is  required 
by  the  user.  An  SR-fiip-flop  circuit  is  shown  in  Figure  7.6(a)  and  its  waveforms  in  Figure  7.6(b).  A 
three-stage  ring  oscillator  is  shown  in  Figure  7.7(a).  The  final  partition  of  the  interval  [0.0ns,40.0ns] 
into  window's  along  with  the  list  of  blocks  to  be  simulated  in  each  window  are  given  in  Table  7.3. 
Here  MFB,  is  the  two-input  NOR  gate,  and  MFB2  and  MFB3  are  the  two  inverters,  respectively.  The 
waveforms  for  this  circuit  are  shown  in  Figure  7.7(b). 

A  one-bit  register  is  shown  in  Figure  7.8(a).  It  is  used  to  realize  a  three-bit  shift  register  shown  in 
Figure  7.8(b)  which  can  shift  both  left  (down)  or  right  (up).  Pass  transistors  are  made  use  of  in  several 
places  in  the  circuit,  first,  to  load  the  input  data  onto  a  bus  (node  l),  then  to  transfer  data  between  the 
bus  and  registers  and  also  to  precharge  the  bus.  The  input  waveforms  applied  and  the  output 
waveforms  produced  are  shown  in  Figure  7.8(c).  A  tally  circuit  composed  of  only  pass  transistors  [56] 
is  shown  in  Figure  7.9(a).  In  this  circuit,  all  the  pass  transistors  constitute  a  single  PTB.  The 
waveforms  for  this  circuit  are  shown  in  Figure  7.9(b).  The  simulations  of  the  three-bit  shift  register 
circuit  and  the  tally  circuit  test  the  performance  of  the  mapping  technique  of  the  delay  operator  using 
Elmore-equivalent  capacitances  as  described  in  Chapter  5.  Finally,  we  consider  a  PLA  with  149  transis¬ 
tors  as  shown  in  Figure  7.10(a).  This  network  is  partitioned  into  42  MFB’s  and  12  PTB’s.  The  only 
nontrivial  SCC  in  the  partitioned  network  consists  of  17  MFB’s  and  4  PTB’s.  The  waveforms  for  this 
circuit  are  shown  in  Figure  7.10(b). 

Among  all  the  networks  described  above,  let  us  first  consider  those  networks  with  feedback. 
Table  7.4  compares  the  performance  of  the  waveform  relaxation  method  (Algorithm  6.1)  and  the  new 
event-driven  dynamic  windowing  scheme  (Algorithm  6.2)  used  to  simulate  the  blocks  within  the 
SCCs.  This  table  demonstrates  that  the  new  windowing  technique  performs  considerably  better  and  is 
more  efficient  than  the  waveform  relaxation  method.  In  Table  7.5  we  provide  a  list  of  all  the  circuits 
that  have  been  simulated  using  MOSTTM  thus  far,  along  with  the  number  of  transistors  (indicated  in 
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Figure  7.6(a) :  An  SR-ftip-ftop 
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Table  7.3 :  Final  list  of  windows  for  a  three-stage  ring  oscillator 


Table  7.4  :  CPU-seconds  taken  by  Algorithms  6.1  and  6.2  to  simulate  networks  with  feedback 


CIRCUIT 

MOS 

TIM 

SPICE2G.1 

Algorithm  6.1 

Algorithm  6.2 

3-stage  Ring  Oscillator 

(7) 

5.23 

1.05 

104.60 

SR-flip-flop 

(12) 

1.33 

0.86 

90.37 

3-bit  Shift  Register 

(29) 

19.23 

6-55 

363.30 

15-stage  Ring  Oscillator 

(31) 

- 

1.36 

139.85 

2-bit  Full  Adder 

(42) 

7.82 

5.27 

794.25 

PLA 

(149) 

13J6 

5.85 

827.43 

parenthesis),  and  the  CPU-time  taken  by  MOSTIM.  The  CPU-time  taken  by  SPICE2G.1  is  also  given 
for  comparison. 

From  each  of  the  waveforms  in  the  circuits  described  above,  one  can  easily  verify  that  the  SLT 
estimates  generated  by  MOSTIM  for  pairs  of  complete  transitions  are  fairly  accurate.  More  precisely, 
consider  the  sequence  of  transitions  S  at  some  node  in  a  circuit  that  is  produced  by  MOSTIM  and  let  S 
be  the  ternary  equivalent  of  the  analog  waveform  produced  by  SPICE2G.1  at  the  same  node.  We  then 
consider  the  extended  measure  p(S,S),  defined  in  Equation  4.2  of  Chapter  4,  to  be  the  measure  of  the 
accuracy  of  the  SLT  estimates  generated  by  MOSTIM.  Figure  7.1 1  is  a  scatter  plot  of  the  transition 
times  of  complete  pairs  of  transitions  as  computed  by  MOSTIM  against  the  corresponding  threshold 
crossing  times  of  the  analog  waveform  as  computed  by  SP1CE2G.1  for  each  node  in  each  of  the  circuits 


Table  7.5  :  A  list  of  circuits  simulated  by  MOSTIM 


CIRCUIT 

MOSTIM 

SP1CE2G.1 

3-stage  Ring  Oscillator 

(7) 

1.05 

104.6 

SR-flip-flop 

(12) 

0.86 

90.37 

Tally  circuit 

(18) 

3-59 

132.37 

1-bit  Full  Adder 

(21) 

1.28 

119.32 

3-bit  Shift  Register 

(29) 

6.55 

363.30 

15-stage  Ring  Oscillator 

(31) 

1.36 

139.85 

2-bit  Full  Adder 

(42) 

5.27 

794.25 

50-inverter  chain 

(100) 

3.19 

645.28 

4-bit  Combinatorial  Full  Adder 

(132) 

3.45 

767.00 

PLA 

(149) 

5.85 

827.43 

listed  in  Table  7.5.  The  maximum  percentage  error  in  the  tuning  estimates  produced  by  MOSTIM  in 
all  these  circuits  is  8.75%.  For  purely  combinational  logic  circuits  with  no  pass  transistors,  such  as  the 
chain  of  inverters  shown  in  Figure  7.3(a),  the  error  is  less  than  3%. 

In  the  case  of  RSIM  [26l  which  is  also  a  switch-level  timing  simulator  for  MOS  circuits,  some  of 
the  timing  predictions  even  for  purely  combinational  circuits  have  been  reported  to  be  around  30%  of 
those  of  SPICE2.  For  circuits  with  chains  of  pass  transistors,  the  predictions  are  even  less  accurate.  In 


comparison,  the  results  presented  in  this  chapter  indicate  that  MOSTIM  is  capable  of  generating  tuning 
estimates  within  10%  of  those  of  SPICE2G.1  at  speeds  of  around  two  orders  of  magnitude  higher,  which 
is  around  the  same  speed  improvement  as  obtained  with  RSIM. 
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CHAPTER  8 


CONCLUSIONS 


The  aim  of  switch-level  timing  simulation  of  VLSI  circuits  is  to  provide  the  circuit  designer  with 
digital  waveforms  at  various  nodes  in  the  circuits  with  special  emphasis  on  the  accuracy  of  the  times  at 
which  the  signals  change  state.  In  this  dissertation  we  have  described  a  switch-level  timing  simulator 
for  NMOS  circuits  which  is  a  fast  and  accurate  simulation  tool  that  gives  adequate  information  on  the 
performance  of  the  circuit  with  a  reasonable  expenditure  of  computation  time  even  for  very  large  cir¬ 
cuits.  In  Chapter  2  of  this  thesis  we  reviewed  some  of  the  existing  simulators  for  integrated  circuits 
and  classified  them  into  two  distinct  categories,  namely,  analog  simulators  and  digital  simulators.  We 
found  that  digital  simulators  in  general  operate  at  sufficient  speeds  to  test  entire  VLSI  systems,  since  the 
circuit  behavior  is  modeled  at  a  logical  rather  than  a  detailed  electrical  level.  However,  these  simula¬ 
tors  do  not  model  the  dynamics  of  the  circuits  properly  and  are  often  useful  only  in  predicting  steady- 
state  responses  of  the  signals.  Analog  simulators,  on  the  other  hand,  predict  both  steady-state  and  tran¬ 
sient  responses  fairly  accurately,  but  are  cost-effective  only  for  circuits  with  less  than  a  few  thousand 
components,  which  are  considered  small  in  the  present  day  VLSI  technology. 

The  algorithms  presented  in  this  thesis  have  lead  to  the  development  of  a  switch-level  timing 
simulator  for  NMOS  VLSI  circuits  called  MOSTIM,  an  attempt  to  bridge  the  gap  between  analog  and 
digital  simulators.  MOSTIM  performs  simulations  at  a  switch  level  and,  hence,  runs  at  speeds  close  to 
those  of  digital  simulators.  Furthermore,  it  uses  a  delay  operator  to  delay  signal  transitions  accurately 
and,  hence,  provides  the  timing  accuracy  comparable  to  those  of  analog  simulators. 

In  Chapter  3,  we  discussed  the  algorithms  for  partitioning  the  input  network  into  various  blocks 
and  the  ordering  of  these  blocks  for  processing.  The  key  to  the  partitioning  strategy  is  to  divide  the  set 
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of  enhancement  transistors  into  driver  transistors  and  pass  transistors.  We  presented  a  graph-theoretic 
algorithm  that  achieves  this  in  imputation  time  which  is  linear  with  the  number  of  enhancement 
devices.  The  driver  transistors  were  then  grouped  together  to  form  multifunctional  blocks  (MFB)  and 
the  pass  transistors  were  grouped  together  to  form  pass  transistor  blocks  (PTB).  We  created  a  third  type 
of  block  called  input  source  (SRC)  to  model  voltage  sources,  clocks,  etc.  We  then  constructed  a  directed 
graph  G  with  vertices  corresponding  to  the  various  circuit  blocks,  namely,  MFB’s,  PTB’s,  and  SRC’s,  and 
directed  arcs  describing  the  interconnections  between  them.  A  modified  version  of  a  depth  first  search 
known  as  Tarjan’s  algorithm  [31]  is  used  to  detect  strongly  connected  components  (SCC)  in  G.  The  ver¬ 
tices  within  an  SCC  correspond  to  blocks  forming  feedback  loops  in  the  original  circuit  and  are  col¬ 
lapsed  into  single  vertices,  thus  creating  an  acyclic  reduced  graph  G.  The  vertices  of  G  are  then  placed 
in  topological  order  for  processing. 

The  algorithms  for  the  switch-level  simulation  of  multifunctional  blocks  and  pass  transistor 
blocks  are  presented  in  Chapter  4.  An  MFB  is  a  single  output,  multiple  input,  unidirectional  block, 
whose  steady-state  output  is  a  Boolean  function  of  its  inputs.  A  graphical  technique  using  internal 
node  eliminations  is  used  to  evaluate  the  state  of  the  signal  at  the  output,  given  the  input  signal  states. 
\o  attempt  is  made  to  evaluate  signals  at  the  internal  nodes  of  the  MFB.  In  the  switch-level  simula¬ 
tion  of  a  PTB,  however,  the  signal  at  every  node  within  the  PTB  is  evaluated.  The  transistors  in  a  PTB 
are  modeled  as  bidirectional  switches  whose  conduction  states  (i.e,  open,  closed,  or  intermediate)  are 
controlled  by  the  signal  at  the  corresponding  gate  terminals.  A  strong  node  forces  its  state  on  a  weaker 
node  connected  to  it  via  a  path  of  conducting  transistors  at  any  given  time  instant.  The  algorithm  is 
quite  similar  to  the  one  used  in  conventional  switch-level  simulators  such  as  MOSSIM  [19],  except  for 
the  interpretation  of  the  u  state  (or  X  state  as  used  in  MOSSIM).  This  algorithm  also  handles  situations 
of  conflict  between  two  strong  signals,  charge  sharing,  etc. 

The  switch-level  simulation  algorithms  described  in  Chapter  4  generate  zero-delay  ternary 
waveforms  for  each  pullup  node  in  an  MFB  and  each  normal  node  in  a  PTB.  A  delay  operator 
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described  in  Chapter  5  is  used  to  delay  pairs  of  complete  transitions  (Le-  0-*u  followed  by  u-*l.  or 
1  — *u  followed  by  u-»0)  in  the  zero-delay  waveforms.  The  delay  operator  computes  appropriate  delay 
values  by  taking  several  parameters  into  account,  such  as  block  configuration,  loading,  device 
geometries,  and  input  slew  rates.  For  XMOS  technology,  knowing  the  delay  characteristics  of  five 
different  circuit  primitives  is  sufficient,  within  reasonable  limits  of  accuracy,  to  compute  delays  through 
anv  general  MFB  or  PTB.  These  five  primitives  are  simulated  using  an  accurate  circuit  simulator  such 
as  SPICE2  [lL  or  SLATE  [3],  for  various  device  and  circuit  parameters,  and  the  delay  values  are 
extracted  and  stored  in  a  delay  table.  This  is  done  in  a  presimulation  phase.  During  simulation,  MOS- 
TIM  then  maps  an  MFB  or  a  PTB  into  one  of  the  five  primitives  and  obtains  the  appropriate  delay 
value  through  fast  table  lookup  methods,  and  interpolation  when  necessary. 

In  Chapter  6  we  discussed  techniques  used  to  process  blocks  within  an  SCC.  In  order  to  perform  a 
switch-level  simulation  of  a  block  (MFB  or  PTB),  the  waveforms  at  the  input  nodes  to  the  blocks  must 
necessarily  be  known.  Since  this  is  not  possible  for  blocks  within  an  SCC,  these  have  to  be  handled 
separately.  A  waveform  relaxation  technique  could  be  used,  wherein  the  blocks  are  processed  itera¬ 
tively  in  a  predetermined  order  with  unknown  input  waveforms  initially  relaxed  and  output 
waveforms  constantly  updated.  Several  drawbacks  of  this  technique  were  discussed.  A  new  dynamic 
windowing  method  that  overcomes  most  of  these  drawbacks  was  presented.  In  principle,  this  new' 
scheme  is  quite  similar  to  the  classical  event-driven  time-wheel  approach  used  in  conventional  logic 
simulators  [13,19],  except  that  events  take  place  during  intervals  of  time  instead  of  occurring  instan¬ 
taneously.  The  entire  time  interval  of  analysis  is  automatically  partitioned  into  variable  size  windows 
such  that  the  signal  at  each  node  in  each  block  within  the  SCC  occupies  a  steady  state  (i^,  0  or  1)  at  the 
window  boundaries.  .Associated  with  each  window  is  a  set  of  blocks  scheduled  for  processing  during 
that  window'.  This  new  scheme  does  not  require  an  a  priori  ordering  of  blocks  within  the  SCC,  and  is 
also  seen  to  take  less  computation  time  and  less  storage  than  the  waveform  relaxation  method. 


A  number  of  \MOS  circuits  have  been  simulated  using  MOSTIM.  The  performance  is  discussed 
in  Chapter  7.  In  all  the  circuits  simulated  thus  far,  MOSTIM  provides  timing  information  with  an 
accuracy  of  within  10%  of  the  timing  provided  by  SP1CE2  [l  J,  at  approximately  two  orders  of  magni¬ 
tude  faster  in  simulation  speed.  MOSTIM  also  provides  much  better  timing  estimates  than  RSIM  [26]  at 
approximately  the  same  speed  of  simulation. 

We  now  consider  several  extensions  that  could  be  used  to  improve  the  performance  of  MOSTIM. 
At  present,  MOSTIM  is  capable  of  only  handling  NMOS  circuits.  A  few  modifications  are  needed  to 
include  CMOS  technologies  as  well.  In  the  partitioning  scheme,  the  graph  used  to  represent  the  net¬ 
work  would  now  consist  of  two  types  of  edges,  namely,  n-type  and  p-type  edges,  corresponding  to  n- 
channel  and  p-channel  transistors,  respectively.  In  conventional  CMOS  circuits,  pass  transistors  are  usu¬ 
ally  implemented  using  n-channel  and  p-channel  devices  having  common  drain  and  source  nodes.  The 
edges  corresponding  to  these  transistors  can  be  easily  detected  and  removed  from  the  graph.  Once  this  is 
done,  a  pullup  node  can  be  identified  as  a  node  adjacent  to  both  n-type  and  p-type  edges  in  the  resulting 
graph.  One  can  then  use  the  scheme  described  in  Chapter  3  of  this  thesis  to  complete  the  partitioning. 
An  MFB  in  CMOS  would  consist  of  a  network  of  n-channel  devices  between  the  pullup  node  and 
ground  and  a  dual  network  of  p-channel  devices  between  the  pullup  node  and  VDD.  A  PTB  would  also 
consist  of  both  n-channel  and  p-channel  pass  transistors.  The  algorithms  to  perform  the  zero-delay 
switch-level  simulation  remain  primarily  the  same,  except  that  a  p-channel  device  is  modeled  as  a 
switch  that  is  closed  when  its  gate  signal  is  at  0  and  open  when  its  gate  is  at  1.  The  delay  primitives 
have  to  be  redefined  by  using  CMOS  inverters  and  pass  transistors  and  the  delay  functions  have  to  be 
recomputed  for  these  new  primitives.  The  mapping  techniques  used  by  the  delay  operator  must  now 
also  account  for  the  resistances  of  the  p-channel  devices  in  addition  to  those  of  the  n-channel  devices. 
With  the  above  modifications  MOSTIM  can  be  extended  to  handle  CMOS  circuits  as  well. 

The  use  of  ratioed  logic,  as  suggested  in  [20,21],  would  result  in  a  better  scheme  to  handle  conflicts 
in  a  PTB.  The  delay  operator  has  to  be  extended  to  provide  better  timing  in  these  situations.  Providing 


better  timing  estimates  in  case  of  charge  sharing  also  needs  to  be  investigated.  Most  conventional  net¬ 
work  extractors  create  an  RC-network  to  model  the  interconnect  regions  in  an  integrated  circuit.  The 
resistance  of  the  metal  lines  can  be  neglected,  but  the  resistances  of  the  polysilicon  and  the  diffusion 
lines  have  a  considerable  effect  on  the  propagational  delays  in  the  circuit.  Using  reduced-order  model¬ 
ing  techniques,  such  as  the  Elmore  time-constant  approach,  to  generate  equivalent  lumped  capacitances 
at  each  node  in  the  circuit  is  another  topic  that  needs  to  be  investigated.  Further  research  is  also  needed 
to  use  a  MOSTLM-like  approach  for  other  technologies,  such  as  bipolar.  ECL,  and  I2L. 

Thus  far,  we  have  only  considered  the  deterministic  simulation  of  integrated  circuits.  It  is  well- 
known,  however,  that  random  fluctuations  inherent  in  the  IC  manufacturing  process  affect  the  perfor¬ 
mance  of  VLSI  circuits  significantly.  This  is  further  aggravated  by  the  scaling  down  of  device  sizes  and 
the  interconnect  regions.  The  circuit  designer  is,  therefore,  often  interested  in  obtaining  some  statistical 
information  about  the  timing  in  the  circuit.  A  Monte-Carlo  simulation  of  the  entire  VLSI  circuit  can 
prove  to  be  prohibitive  in  terms  of  CPU -time.  As  an  alternative,  one  could  compute  the  statistical 
behavior  of  the  delays  through  standard  primitives  using  the  conventional  Monte-Carlo  methods  and 
store  the  necessary  information  in  tables.  One  could  then  map  a  general  block  in  the  network  onto  one 
of  the  standard  primitives  and  obtain  the  statistical  timing  information  through  a  look-up  table.  This 
approach  is  very  similar  to  the  operation  of  the  delay  operator  in  MOST1M,  and  it  needs  further  inves¬ 
tigation. 
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APPENDIX  I 


PLOTS  OF  DELAY  FUNCTIONS 


In  this  appendix,  we  will  show  plots  of  the  inertial  delay.  At,,  and  the  rise/fall  delay,  At2,  in 
both  types,  "0“  and  "l",  as  functions  of  the  input  slew-rate  A^,  for  standard  primitives  1,  2,  and  3,  and 
as  functions  of  0  and  y,  in  addition,  for  standard  primitives  4  and  5  for  the  following  technology. 


VTOe=+1.0  V 


VTOD=— 3.0  V 
Vdd=+5.0  V 
KP=  100  /iA/V2 


Standard  devices : 


Load  :  W=5  n ,  L=10  fi 
Driver :  W=10  n  ,  L=5  (t 
Pass  :  W=10  ft ,  L=10  fi 


Standard  capacitances : 
C,5— 0.01  pF 
C2s=0,01  pF 
C-3s=0.0 1  pF 
C45 —0. 1 0  pF 
Css=0.10  pF 


Dimensionless  parameters : 


0€  10.1,1.0,5.0} 
y€  10.1,1.0,10.0} 


8=4.0 


The  plots  are  shown  in  Figures  A  1.1  to  Al.ll.  The  delay  values  in  all  these  plots  are  in  nano 


seconds. 


Figure  A1J :  Rise  delay  for  standard  primitive  4,  type  "0 


APPENDIX  D 


MINIMUM  FEEDBACK  ARC  SETS  FOR  DIRECTED  GRAPHS 

A  minimum  feedback  arc  set  for  a  directed  graph  is  a  minimum  set  of  arcs  which  if  removed 
leaves  the  resultant  graph  free  of  directed  cycles.  This  problem  has  attracted  the  interest  of  both 
mathematicians  and  engineers  over  the  recent  years.  Feedback  is  inherent  in  most  engineering  applica¬ 
tions.  such  as  sequential  switching  circuits,  control  mechanisms,  regulatory  devices,  and  large-scale  sys¬ 
tems.  A  good  deal  of  success  has  been  achieved,  however,  in  analyzing  complicated  systems  without 
feedback.  Therefore,  in  order  to  analyze  systems  with  feedback,  an  appropriate  number  of  feedback 
loops  are  broken  to  reduce  the  system  to  one  without  feedback.  The  complexity  of  this  analysis,  on  the 
other  hand,  increases  drastically  with  the  number  of  loops  to  be  broken;  hence,  a  knowledge  of 
minimum  feedback  arc  sets  would  be  extremely  useful. 

The  problem  of  finding  minimum  feedback  arc  sets  (FAS)  in  directed  graphs,  when  phrased  as  a 
decision  problem,  is  known  to  be  NP-Complete  [52^53,57],  This  problem  remains  NP-Complete  even  for 
a  restricted  class  of  graphs  such  as  line-digraphs  [58},  In  this  appendix  we  shall  study  an  algorithm  pro¬ 
posed  by  D.H. Younger  [60]  which  attempts  to  solve  the  FAS  problem  by  establishing  a  relationship 
between  feedback  arc  sets  and  orderings  on  the  vertices  of  a  digraph.  It  will  be  shown  that  this  algo¬ 
rithm  does  indeed  find  a  minimum  feedback  arc  set  in  any  digraph,  but  could  take  exponential  time,  as 
should  be  expected,  on  certain  digraphs.  The  problem  of  finding  minimum  feedback  arc  sets  for  planar 
graphs,  however,  has  been  shown  to  be  solvable  in  polynomial  time  [59]. 

We  begin  by  establishing  a  relationship  between  feedback  arcs  of  a  digraph  and  orderings  on  its 
vertex  set.  In  fact,  a  minimum  feedback  arc  set  is  shown  to  be  determined  by  an  optimum  ordering  R 
of  vertices  which  minimizes  the  number  of  arcs  (u,v)  such  that  R(u)^R(v).  A  key  concept  used  to  find 
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optimum  orderings  is  that  of  an  admissible  ordering  [60].  While  fending  optimum  orderings  may  be 
hard  (since  the  problem  is  \P-Complete),  we  will  show  that  finding  admissible  orderings  is  relatively 
much  easier  since  it  can  be  done  in  polynomial  time.  For  most  digraphs  of  interest  to  the  practical  user, 
admissible  orderings  turn  out  to  be  almost  as  "good"  as  optimum  orderings  in  that  they  generate  "fairly 
small”  feedback  arc  sets. 

We  begin  with  some  definitions  and  notations.  For  a  directed  graph  a  feedback  arc  set  is  a  set  of 
arcs  which,  if  removed,  leaves  the  resultant  graph  free  of  directed  cycle.  A  feedback  arc  set  is 
minimum  if  no  other  feedback  arc  set  for  that  digraph  has  fewer  arcs.  For  any  sequential  ordering  R  on 
the  vertices  of  a  digraph  GKV,A),  let  FB={(u,v)€A  such  that  R(u)^R(v)}  designate  the  feedback  arc 
set  determined  by  R.  Also  let  Q(R)=|FH|. 

Definition  :  A  sequential  ordering  R*  is  said  to  be  an  optimum  ordering  if  Q(R’)^Q(R)  for  all  sequen¬ 
tial  orderings  R.  Given  a  sequential  ordering  R  of  a  digraph,  a  consecutive  subgraph  is  an  induced  sub¬ 
graph  on  any  (non-empty)  set  of  vertices  that  are  consecutively  ordered  by  R. 

We  are  now  ready  to  state  some  properties  of  optimum  orderings  from  [60]. 

Theorem  A2.1  :  A  feedback  arc  set  F  of  a  digraph  G  is  minimum  if  and  only  if  there  exists  an 
optimum  ordering  R  such  that  F=FR. 

Proof  :  See  [60]. 

The  above  theorem  clearly  illustrates  the  equivalence  between  optimum  orderings  and  minimum 
feedback  arc  sets  of  a  digraph.  Hence  the  problem  of  finding  optimum  orderings  is  indeed  NP-Complete. 

Theorem  A2.2  :  The  set  of  optimum  orderings  for  a  given  digraph  is  invariant  under  the  removal  of 
self-loops  and  directed  cycles  involving  two  arcs. 

Proof :  See  [60]. 

In  accordance  with  the  above  theorem,  two  digraphs  are  said  to  be  order  equivalent  if  the  removal 
of  all  self-loops  and  two-cycles  from  each  digraph  results  in  isomorphic  graphs.  A  subgraph  of  a 


digraph  obtained  by  removing  all  self-loops  and  two-cycles  is  called  the  reduced  graph.  Therefore,  an 
optimum  ordering  for  a  digraph  is  also  an  optimum  ordering  for  its  reduced  graph.  It  must  be  noted, 
however,  that  a  minimum  feedback  arc  set  of  a  reduced  graph  is  only  a  subset  of  some  minimum  feed¬ 
back  arc  set  of  the  original  graph. 

Theorem  A23  :  Given  an  optimum  ordering  R  of  a  digraph  G(V,A),  let  Gj  be  any  consecutive  sub¬ 
graph  of  G  according  to  R,  and  define 

F1K=|(u,v) :  R(u)^R(v)  and  u,v€V(Gj)}  and  F2>b=Fr— Fl  K.  Then 

(a)  F1R  must  be  a  minimum  feedback  arc  set  of  G,; 

(b)  F2Jj  must  be  a  minimum  feedback  arc  set  of  the  subgraph  H  obtained  from  G  by  deleting  all  arcs 
and  coalescing  all  vertices  of  Gx. 

Proof :  See  [601 

It  follows  from  part  a)  of  the  above  theorem  that  for  an  optimum  ordering  R  on  a  digraph  G,  for 
any  two  vertices  u  and  v  such  that  R(v)=R(u)+l,  the  number  of  arcs  from  u  to  v  is  no  less  than  the 
number  from  ▼  to  u.  In  fact,  a  much  stronger  result  follows. 

Notation  :  Suppose  Gt  and  G2  are  two  disjoint  induced  subgraphs  of  a  digraph  G.  We  use  (GlrG2)  to 
denote  the  set  of  arcs  in  G  with  tail  vertex  in  Gx  and  head  vertex  in  G2.  Given  an  ordering  R,  two  dis¬ 
joint  consecutive  subgraphs  Gj  and  G2  are  said  to  form  an  R-ad jacent  pair ,  denoted  by  [Gi,G21  if 

min{R(v) :  v€G2}=max(R(u) :  u€G,}+l. 

Theorem  A2A  :  Given  an  optimum  ordering  R  for  a  digraph  G,  let  [GlrG2]  be  an  R-adjacent  pair  of 
disjoint  consecutive  subgraphs  of  nj  and  n2  vertices,  respectively.  Then 

a)  J(G„G2)|>|(G2,G1)|,  and 

b)  if  J(G„G2)|=|(G2,Gi  )J,  then  the  ordering  R,  obtained  from  R  as  follows,  is  also  optimum  : 
R<u)=R(u)  if  u  is  neither  in  Gj  nor  in  G2. 

Rtu)=R(u)— n,  if  u€G2. 


Rtu)=R(u)+n2  if  u€Gj. 

Proof :  See  [60]. 

Definition  :  A  feedback  arc  set  for  a  digraph  is  minimal  if  it  contains  no  proper  subset  that  is  also  a 
feedback  arc  set  for  this  graph. 

Definition  :  An  ordering  R  for  a  digraph  G  is  said  to  be  admissible  if 

a)  The  condition  |(GltG2)|^|(G2,Gi)|  is  satisfied  by  all  R-adjacent  pairs  [Gt,G2]  of  disjoint  consecu¬ 
tive  subgraphs  of  G,  and 

b)  The  feedback  arc  set  FK  determined  by  R  is  minimal. 

By  definition  and  by  Theorem  A2.4  a)  it  is  clear  that  all  optimum  orderings  of  G  are  also  admissi¬ 
ble.  However,  there  might  be  admissible  orderings  that  are  not  optimum.  We  shall  show  that  starting 
from  any  arbitrary  ordering  of  a  digraph  it  is  possible  to  obtain  admissible  orderings  in  polynomial 
time.  Hence  for  a  class  of  digraphs  in  which  an  admissible  ordering  is  also  an  optimum  ordering  in  each 
digraph,  finding  minimum  feedback  arc  sets  is  indeed  solvable  in  polynomial  time. 

The  strategy  we  wish  to  employ  to  find  optimum  orderings  is  to  start  with  any  arbitrary  ordering 
and  first  obtain  an  admissible  ordering.  The  vertices  of  the  graph  are  relabeled  as  a',b',c',  •  •  •  according 
to  this  new  ordering  which  we  will  refer  to  as  the  admissible  reference  ordering.  This  ordering  is 
then  selectively  perturbed  to  obtain  a  new  admissible  ordering  with  fewer  feedback  arcs  (if  one  exists) 
which  then  becomes  the  admissible  reference  and  the  process  is  repeated  till  an  optimum  ordering  is 
found.  We  need  some  more  terminology  and  results  before  going  into  the  description  of  the  entire  algo¬ 
rithm. 

Definition  :  Two  sequential  orderings  of  a  digraph  are  F -identical  if  they  determine  the  same  feedback 
arc  set.  An  F -identical  class  of  orderings  is  a  set  of  orderings  all  of  which  are  F-identical.  Given  an 
admissible  reference  ordering  Rref  and  an  F-identical  class  t,  the  ordering  in  t  that  is  lexicographi- 
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cally  closest  to  R„f  is  said  to  be  the  F-representative  of  St.  Given  a  digraph  with  vertices  labeled 
according  to  some  admissible  reference  ordering  Rref  and  given  any  arbitrary  ordering  R,  a  sequent 
derived  from  R  is  an  ordered  pair  of  vertices  [u,v]  for  which  R(v)=R(u)+l.  If,  further, 
R„f(u)<Rref(v)  then  [u,v]  is  an  up-sequent;  if  Rref(u)>Rrtffv)  then  [u,v]  is  a  down-sequent. 

Theorem  A2.5  :  In  a  reduced  graph  G  whose  vertices  are  labeled  according  to  an  admissible  reference 
ordering  Rref,  given  an  F-identical  class  ♦  with  an  admissible  F-representative  RF,  there  exists  one  or 
more  arcs  (u,v)  in  G  for  every  down-sequent  [u,v]  derived  from  RF. 

Note  :  The  arcs  from  u  to  v  are  forward  arcs  under  RF  but  are  feedback  arcs  under  Rref. 

Proof  :  (See  [60]).  Since  G  is  a  reduced  graph,  there  cannot  be  arcs  both  from  u  to  v  and  from  v  to  u. 
So,  if  we  eliminate  the  possibilities  of  one  or  more  arcs  from  v  to  u,  or  no  arcs  between  u  and  v  in  G, 
then  we  are  done. 

Suppose  G  has  one  or  more  arcs  from  v  to  u.  Since  [u,v]  is  a  sequent  derived  from  Rr  (it, 
RF(v)=RF(u)+l),  reversing  the  order  of  u  and  v  in  RF  produces  an  ordering  with  a  feedback  arc  set 
that  is  a  proper  subset  of  that  determined  by  RF,  thereby  contradicting  the  minimality  and,  hence,  the 
admissibility  of  RF.  Now  suppose  that  there  are  no  arcs  between  u  and  v  in  G.  The  ordering  produced 
from  Rf  by  switching  the  positions  of  u  and  v  is  then  lexicographically  closer  to  the  reference  Rref 
than  Rf.  while  having  the  same  set  of  feedback  arcs  (i^,  the  new  ordering  is  also  in  ¥),  which  is  a  con¬ 
tradiction  to  the  designation  of  RF  as  the  F-representative  of  St.  □ 

We  now  begin  by  describing  an  algorithm,  which,  for  any  given  directed  graph  GCV%A)  and  some 


arbitrary  initial  ordering  R,  obtains  an  admissible  ordering  Rv  The  algorithm  to  find  optimum  order¬ 
ings  then  treats  RA  as  a  reference  and  selectively  perturbs  it  to  obtain  a  better  ordering.  This  procedure 
is  iterated  until  an  optimum  ordering  is  obtained. 


Main  program 

INPUT  :  Reduced  graph  G(V,A)  and  initial  ordering  R 
OUTPUT :  An  optimum  ordering  Ropt  of  G. 


BEGIN 

ADMISSIBLE  (G<V,A) ,  R^ ,  R) 

p*-0 

OPTIMUM  (GiVfA)  ,  R  ,  R' ,  p) 
IF  p=l  THEN 
R«-R 

GO  TO  step  2) 

ELSE 

Ropt4-® 

STOP 

ENDIF 


END 


subroutine  ADMISSIBLE  (G(V,A) ,  R  ,  RA) 
BEGIN 

1) i«~0;Ro«-R;nHv| 

2)  CONSEC  (GCV,A) ,  R, ,  R*  ,  p) 

3)  MINIMAL  (G<V,A)  ,  R*  ,  Ri+1 ,  q) 

4)  IF  p=l  or  q=l  THEN 

i«-i+l 

GO  TO  step  2) 

ELSE 

Ra-Ri 
RETURN  Ra 

ENDIF 

END 


subroutine  CONSEC  (G(V,A)  ,  R  ,  R' ,  p) 

BEGIN 

1)  p*-0. 

2)  Relabel  vertices  of  G  as  Vj,v2, 

such  that  R(vi)=i  for  each  i=l,2, . . . ,  n 

3)  FOR  i=l  TOn  DO 

RtVi)^-R(vi) 

4)  FOR  i=l  TO  n— 2  DO 

BEGIN 

FOR  j=i+l  TO  n— 1  DO 
BEGIN 

FOR  k=j+l  TO  n  DO 


5) 


6) 

7) 


END 


END 

END 


'I  Vi, 


Wj,Vi 

IV, 


i+l*  •  •  •  »  Vj_,> 

•  •  •  •  vk-l5 


v,< 
v2- 

nl*~ 

n2-lV2| 

G,-G[V,3 

g2-g[v2] 
s,-|(g„g2) 

Sz-KG^G,) 

IF  S,  <S2  THEN 
p*-l 

FOR  m=i  TO  j-1  DO 
R‘(vni)«-R(vm)+n2 
FOR  m=j  TO  k— 1  DO 


RtvJ^-Rlv^n, 


RETURN  R' 


ENDIF 


END 


subroutine  MINIMAL  (G(V,A) ,  R  ,  R‘ ,  q) 
BEGIN 

1) q«-0 

2)  FOR  EACH  vertex  v€V  DO 

Rtv)«-R(v) 

3)  F,  ♦“{(!»,▼)€  A  :  R(u)^R(v)} 

4) G-G— F, 

5) F2-F, 

6)  FOR  EACH  arc  a€F,  DO 

BEGIN 

G'*~  G'+a 

IF  G‘  is  still  acyclic  THEN 

F2-F2— a 
q-1 

ELSE 

G-G-a 

ENDIF 

END 

7)  IF  q=l  THEN 

R’  <-  topological  ordering  on  G’ 
ENDIF 
RETURN  R 

END 


subroutine  OPTIMUM  (GCV^A) ,  R  ,  S' ,  p) 


1 )  Relabel  vertices  of  G  according  to  R ;  R'— R 

2)  i«- 1 ;  Gj  -G ;  R3  -R ;  I,  «-0; 
Q,«-|{(u,v)€A(G) :  R^uI^R/v)}} ; 

3)  TREE  (i  ,  M) 

4)  NEXTSON  (i ,  j ,  noson) 

5)  IF  noson  =  1  THEN 

IF  i=l  THEN 

P-0 

R'«-R 

RETURN  R* 

ELSE 

i«-FATHER(i) 

GO  TO  step  4) 

ENDIF 

ELSE 

i-j 

ENDIF 

6)  IF  Qj  <M— Ij  THEN 

P-1 

R‘«-R 

RETURN  R1 

ELSE 

GO  TO  step  3) 

ENDIF 

END 


subroutine  TREE  (i ,  M) 

BEGIN 

1)  FH(u,v)€A(Gj)  :  Ri(u)^Ri(v)} 

2)  FOR  EACH  arc  (u,v)€F  DO 

BEGIN 

UNITE  (ufv,Gi>Ri,G'Al'4') 

IF  M— (Ii+D^O  THEN 
ntree  «—  ntree  + 1 
j«-ntree 
Ij-Ii+F 
GrG 

ADMISSIBLE  (G',RVRj) 
Qj»-|{(x,y)eA' :  Rtxl^Rty)}! 
FATHER!  j)«-i 
SON(i)-SON(i)U{j} 

ENDIF 

END 

RETURN 

END 


The  subroutine  ADMISSIBLE  starts  with  an  ordering  R*  and  calls  CONSEC  to  check  if  it  satisfies 
condition  a)  of  admissibility.  If  it  does  (indicator  p-0)  then  there  is  no  change;  however,  if  not 
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(indicator  p=l).  then  an  intermediate  ordering  Rin  is  produced  with  fewer  feedback  arcs.  Subroutine 
MIXIMAL  is  then  called  to  check  for  minimality  of  the  feedback  arc  set  F,  of  Rin.  If  Fj  is  found 
minimal  (indicator  q=0)  then,  again,  there  is  no  change,  otherwise  (indicator  q=l),  the  minimal  proper 
subset  F»  is  found  and  a  new  ordering  Ri+1  with  this  as  its  feedback  arc  set  is  obtained.  If  either  p=l  or 
q=l,  then  Q(R1+1)<Q(Ri),  in  which  case  i  is  incremented  by  1  and  the  process  is  repeated.  In  fact 
Q<  Rl+1)*QtR,)  if  and  only  if  both  p=0  and  q«0,  in  which  case  the  program  halts. 

Theorem  A2.6  :  Given  a  digraph  GCV\A)  with  n=|V|  and  a=|A|  and  any  initial  sequential  ordering, 
the  subroutine  ADMISSIBLE  halts  at  an  admissible  ordering  and  the  number  of  computations  involved 
is  bounded  above  by  a  polynomial  P(n,ar)  in  n  and  at. 

Proof  :  Let  R^,  ,R2,  • . . » Rj*  *  *  *  be  the  sequence  of  orderings  produced  during  each  iteration  of  sub¬ 
routine  ADMISSIBLE  Let  m^QtRj)  be  the  number  of  feedback  arcs  determined  by  R,.  Since 
^0  for  each  i,  there  exists  a  smallest  integer  s  such  that  1115=1115+,  and  mj>mi+,  for  each 
0<i  <s.  Therefore  the  program  halts  after  s  iterations.  At  this  stage  both  indicators  p  and  q  must  be  0 
which  means  that  R,  must  be  admissible.  Clearly  s^mo^aq  therefore,  the  number  of  iterations  is  at 
most  the  number  of  arcs  in  G. 

During  each  call,  steps  2)  and  3)  of  COXSEC  together  involve  at  most  2n  computations,  while 
steps  5),  6),  and  7)  require  at  most  (2n+a)  computations  for  each  R-adjacent  pair  [G„G21 

Lemma  :  Given  a  digraph  G(V,A)  with  n=|V|,  and  an  ordering  R,  the  number  of  R-adjacent  pairs 
[G,,Gi]  of  disjoint  consecutive  subgraphs  of  G  is  (n+l)n(n— 1)/6. 

Proof  :  Relabel  the  vertices  of  G  as  v,,v2, . . . ,  vn  according  to  R.  Arrange  n  dots  labeled  1,2, . . . ,  n 
on  a  straight  line  in  ascending  order  fr.*m  left  to  right.  Place  dummy  dots  0  on  the  left  of  1  and  n+1 
to  the  right  of  n.  We  now  have  a  linear  arrangement  of  n+2  dots  creating  n+1  empty  spaces  between 
them.  If  we  pick  any  three  spaces  among  the  n+1  empty  spaces  and  place  a  slash  (/)  in  each  of  them, 
then  we  can  associate  V,  to  be  the  vertices  corresponding  to  dots  between  the  first  and  second  slashes 
while  V2  to  those  between  the  second  and  third  slashes.  G,  and  G2  are  then  the  consecutive  subgraphs 


of  G  induced  by  V,  and  V2,  respectively.  Hence  the  proof  of  the  lemma. 

Therefore,  the  total  number  of  computations  performed  during  each  call  to  CONSEC  is  at  most 
(2n+(2n+a)x(n3— n)/6).  In  subroutine  MINIMAL  step  6)  requires  at  most  nXa  computations  while 
the  other  steps  would  need  at  most  n+3or  computations.  Thus  each  iteration  of  ADMISSIBLE  performs 
at  most  Q(n,a)=2n+(2n+a)x(n3— n)/6+n+3a+na  computations.  Since  the  number  of  iterations  is 
at  most  a  we  have  P(n,a):=aQ(n,a)  as  the  upper  bound  on  the  total  number  of  computations  involved 
in  obtaining  an  admissible  ordering  for  G.  □ 

We  now  consider  the  algorithm  to  find  an  optimum  ordering  of  a  digraph.  By  Theorem  A2.2  we 
need  consider  only  reduced  graphs.  So,  for  a  reduced  graph  G(V,A)  with  some  arbitrary  initial  order¬ 
ing,  an  admissible  reference  ordering  R  is  first  obtained.  For  each  feedback  arc  of  R  a  cyclic  shift  by 
one  order  position  is  performed  on  the  vertices  of  a  consecutive  subgraph,  where  the  subgraph  has, 
before  the  shift,  the  feedback  arc  connecting  its  two  extreme  vertices.  Of  the  two  possible  directions  for 
this  cyclic  shift,  it  is  convenient  to  choose  the  one  which  results  in  fewer  feedback  arcs.  This  results  in 
a  new  ordering  which  has  a  down-sequent  corresponding  to  the  feedback  arc  of  R.  This  results  in  Q(R) 
new  orderings  which  are  made  admissible  by  passing  them  through  subroutine  ADMISSIBLE  If  one  of 
these  admissible  orderings  R'  is  better  than  R,  i.e„  Q(R')<Q(R)  then  R’  is  treated  as  a  new  reference.  If 
one  of  the  initial  perturbations  does  not  establish  a  new  reference,  then  each  of  these  is  selectively  per¬ 
turbed  in  a  similar  wav  and  thus  the  search  branches  out.  It  is  clear  that  we  are  only  looking  at  order¬ 
ings  whose  down-sequents  are  feedback  arcs  of  R.  The  following  result  justifies  this  approach. 

Theorem  A2.7  :  Given  a  reduced  graph  GfV^A)  and  an  admissible  reference  ordering  R.  If  R  is  not 
optimum  then  there  exists  an  ordering  R  with  Q(R')<Q(R)  such  that  every  down-sequent  of  R’ 
corresponds  to  a  feedback  arc  of  R. 

Proof  :  Label  the  vertices  of  G  according  to  the  reference  ordering  R.  Let  R0  be  an  optimum  ordering  of 
G.  Since  R  is  not  optimum  Q(Ro)<Q(R).  Let  ♦  be  the  F-identical  class  containing  R0.  Let  R’  be  the 
F-representative  of  Since  R‘  is  optimum,  it  is  also  admissible.  Also  R'  cannot  be  the  same  as  R  since 


Q(R')<Q(R)  and  so  must  have  at  least  one  down-sequent.  But  by  Theorem  A2.5  every  down  sequent 
[u,v]  of  R'  must  correspond  to  an  arc  (u,v)  in  G.  Since  [u,v]  is  a  down-sequent,  we  must  have 
R(u)^R(v),  by  definition.  Hence  (u,v)  is  a  feedback  arc  of  R. 

We  now  describe  a  limiting  mechanism  which  keeps  the  search  for  better  orderings  from  becom¬ 
ing  extremely  unwieldy.  It  is  useful  to  imagine  a  tree  which  grows  from  a  root  vertex  (labeled  1). 
Associated  with  each  vertex  i  of  this  tree  is  a  reduced  graph  Gj,  an  admissible  ordering  R;  on  the  ver¬ 
tices  of  Gj,  Qi=Q(Ri)  and  an  integer  I;  which  indicates  the  difference  between  the  minimum  number  of 
feedback  arcs  of  G  and  Gj.  Initially  G1=G,  R,=R  and  li=0,  and  M=Q(R).  The  subroutine  TREE(LM) 
creates  ’sons’  for  vertex  i  in  the  tree  as  follows: 

For  each  feedback  arc  of  G;  according  to  Rj  a  cyclic  shift  is  performed  to  establish  the  end  points 
of  this  arc  as  a  down  sequent.  The  two  vertices  of  this  down-sequent  are  united  into  a  single  ver¬ 
tex  and  all  self -loops  and  2-cycles  created  by  this  union  are  eliminated  resulting  in  a  reduced 
graph  G'  and  an  ordering  R’.  Let  I‘  be  the  number  of  2-cycles  thus  eliminated.  Each  vertex  of  G 
thus  corresponds  to  a  consecutive  subgraph  of  the  original  graph  G.  The  down-sequent,  say  [u,vl 
which  gets  united  into  a  single  vertex,  gets  an  equivalent  label  which  is  the  label  of  v  appended 
to  the  label  of  u.  Thus,  R'  can  also  be  treated  as  an  ordering  of  G  by  reading  off  the  labels  of  G’  in 
order  according  to  R'.  A  ’son’  j  is  created  only  if  M— (Ij+I'XO,  in  which  case  Gj=G',  Rj=R‘,  and 
Ij=Ii+F.  If  M— (Ij+I'XO  then  it  means  that  any  ordering  that  will  be  derived  from  R'  by  the 
above  procedure  will  have  at  least  Ij+I'  arcs  of  G  in  its  feedback  arc  set;  therefore,  an  ordering 
better  than  the  original  R  can  never  be  obtained  this  way. 

We  would  now  like  to  make  a  few  comments  about  the  computational  complexity  of  this  pro¬ 
cedure.  The  number  of  iterations  in  the  main  algorithm  is  again  at  most  the  number  of  arcs  of  G,  since 
successive  orderings  are  better  than  the  previous  ones.  So  if  computations  within  subroutine  OPTIMUM 
can  be  performed  in  polynomial  time  then,  indeed,  the  entire  algorithm  runs  in  polynomial  time.  This 
is  impossible  since  by  Theorem  A2.7  this  algorithm  indeed  terminates  in  an  optimum  ordering,  while 
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obtaining  one  is  known  to  be  NP-Complete.  However,  if  one  examines  the  computations  within  subrou¬ 
tine  OPTIMUM,  the  only  quantity  that  can  grow  exponentially  with  n=|V|  is  the  number  of  vertices 
of  the  tree.  It  would  be  interesting  to  find  such  a  digraph  on  n  vertices  for  any  general  n.  An  upper 
bound  on  the  depth  of  the  tree  is  n— 1  since  the  leaves  of  the  tree  correspond  to  two-vertex  digraphs 
and  the  digraph  associated  with  a  son  has  one  vertex  less  than  that  associated  with  its  father.  So,  even 
bounding  the  number  of  sons  by  k  gives  us  at  most  k“  vertices  in  the  tree  which  does  not  help. 

We  now  illustrate  with  an  example  the  use  of  the  above  algorithm.  Consider  the  reduced  graph 
G(V,A)  shown  in  Figure  A2.1.  The  natural  ordering  a,b,c,cUe  can  easily  be  verified  to  be  admissible. 
Figure  A2.2  shows  the  tree  structure  of  the  search  for  a  better  ordering.  Note  that  [e^a]  represents  a 
consecutive  subgraph  of  G  with  three  vertices  e,  c,  and  a  appearing  in  that  order.  The  search  ter¬ 
minates  at  a  three-vertex  digraph  with  indicator  p-1  meaning  that  the  ordering  b,e,c^d  is  a  better 
ordering.  Indeed  this  new  ordering  has  only  two  feedback  arcs  which  is  one  less  than  that  of  the 
natural  ordering.  The  vertices  are  relabeled  as  a',b’,c',d',e'  according  to  this  new  admissible  reference. 


a 


Figure  A2.1  :  A  reduced  graph  G(V,A) 
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Figure  A2.3  :  Tree  structure  with  R'  as  the  admissible  reference 
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